Project

General

Profile

Job output Cream CE

Added by Balint Tunde about 14 years ago

Hi,
I was trying to submit a job to CREAM CE.

jsaga-job-run.sh -d -b -Executable /bin/hostname -Output out.txt \
-Error err.txt -WorkingDirectory /tmp -FileTransfer \
'/tmp/out2.txt\<out.txt,/tmp/err2.txt\<err.txt' \
-r cream://stremsel.nikhef.nl:8443/cream-pbs-short
[
  Type = "Job";
  BatchSystem   = "pbs";
  QueueName     = "short";
  Executable = "/bin/hostname";
  StdOutput = "out.txt";
  StdError = "err.txt";
  SandboxDirectory = "gsiftp://stremsel.nikhef.nl:2811/tmp/1281344489864";
  OutputSandboxPostStaging = 2;
  OutputSandboxPostStaging_0_From = "gsiftp://stremsel.nikhef.nl:2811/tmp/1281344489864/out.txt";
  OutputSandboxPostStaging_0_To = "/tmp/out2.txt";
  OutputSandboxPostStaging_0_Append = "false";
  OutputSandboxPostStaging_1_From = "gsiftp://stremsel.nikhef.nl:2811/tmp/1281344489864/err.txt";
  OutputSandboxPostStaging_1_To = "/tmp/err2.txt";
  OutputSandboxPostStaging_1_Append = "false";
  OutputSandbox = {
                "out.txt",
                "err.txt"};
  OutputSandboxDestURI = {
                "gsiftp://stremsel.nikhef.nl:2811/tmp/1281344489864/out.txt",
                "gsiftp://stremsel.nikhef.nl:2811/tmp/1281344489864/err.txt"};
  Requirements = true ;      
  Rank = -other.GlueCEStateEstimatedResponseTime ;
  RetryCount = 0;
]

After a while I got the following error message:
jsaga-job-status.sh [cream://stremsel.nikhef.nl:8443/cream-pbs-short]-[CREAM874768963]Job failed.
Exception in thread "main" NoSuccess: Cannot move OSB (${globus_transfer_cmd} 
file:///tmp/jobdir/7115972.stro.nikhef.nl/CREAM874768963/err.txt 
gsiftp://stremsel.nikhef.nl:2811/tmp/1281342041301/err.txt): error: 
globus_ftp_client: the server responded with an error500 500-Command failed. :
globus_l_gfs_file_open failed.500-globus_xio: Unable to open file 
/tmp/1281342041301/err.txt500-globus_xio: System error in open: No such file 
or directory500-globus_xio: A system call failed: No such file or directory500 End.; 
Cannot move OSB (${globus_transfer_cmd} 
file:///tmp/jobdir/7115972.stro.nikhef.nl/CREAM874768963/err.txt 
gsiftp://stremsel.nikhef.nl:2811/tmp/1281342041301/err.txt): error:
 globus_ftp_client: the server responded with an error 500 500-Command failed. : 
globus_l_gfs_file_open failed.  500-globus_xio: Unable to open file 
/tmp/1281342041301/err.txt  500-globus_xio: System error in open: 
No such file or directory  500-globus_xio: A system call failed: No such file or directory  500 End.
        at fr.in2p3.jsaga.adaptor.job.monitor.JobStatus.<init>(JobStatus.java:35)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobStatus.<init>(CreamJobStatus.java:37)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatus(CreamJobMonitorAdaptor.java:97)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatusList(CreamJobMonitorAdaptor.java:38)
        at fr.in2p3.jsaga.engine.job.monitor.request.JobStatusRequestor.getJobStatus(JobStatusRequestor.java:34)
        at fr.in2p3.jsaga.engine.job.monitor.JobMonitorService.getState(JobMonitorService.java:78)
        at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.queryState(AbstractSyncJobImpl.java:248)
        at fr.in2p3.jsaga.impl.task.AbstractTaskImpl.getState(AbstractTaskImpl.java:243)
        at fr.in2p3.jsaga.impl.job.instance.JobImpl.getState(JobImpl.java:79)
        at fr.in2p3.jsaga.command.JobStatus.main(JobStatus.java:59)

I know that a while ago I also encountered this message and the problem was that the OutputSandbox wasn't created correctly. So as far as I understand the directory named UniqueID isn't created in /tmp on the CE. (I couldn't find it when I tried listing the /tmp dir on stemsel. I also looked for other directories, because I know that the job which ran had a different unique id then the one for which I obtained the job description.)
I was wondering if there is any workaround/way to obtain the results from a job running on CREAM CE.
Best,
Tünde


Replies (8)

RE: Job output Cream CE - Added by Reynaud Sylvain about 14 years ago

Hi Tünde,

I have never seen this error message. Does it appear systematically or randomly?

If the sandbox directory is not created, there is no workaround/way to obtain the result. Anyway, I plan to improve this plug-in as soon as CREAM will provide a way to use its default sandbox directory (instead of JSAGA's temporary directory) without doing useless transfers.

Up to now, I have not been able to find a way to use this default sandbox directory simultaneously with other gsiftp servers (like I do with the new WMS adaptor), and that's why this directory is created...

Best regards,
Sylvain

RE: Job output Cream CE - Added by Balint Tunde about 14 years ago

Hi Sylvain,

It appears systematically (command line and program too).
I tried to create a directory in the CE's /tmp directory using jsaga-mkdir.sh and it worked.
I even tried with 2 CREAM CE's.
When I switched the CE I got:

tunde@schrift:~/SAGA/JSAGA/2010-08-07$ jsaga-job-run.sh -b \
-Executable /bin/hostname -Output out.txt -Error err.txt \
-WorkingDirectory /tmp -FileTransfer '/tmp/out2.txt\<out.txt,/tmp/err2.txt\<err.txt' \
-r cream://creamce2.gina.sara.nl:8443/cream-pbs-short


tunde@schrift:~/SAGA/JSAGA/2010-08-07$ jsaga-job-status.sh \
             [cream://creamce2.gina.sara.nl:8443/cream-pbs-short]-[CREAM148503979]
Job failed.
Exception in thread "main" NoSuccess: reason=1
        at fr.in2p3.jsaga.adaptor.job.monitor.JobStatus.<init>(JobStatus.java:35)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobStatus.<init>(CreamJobStatus.java:37)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatus(CreamJobMonitorAdaptor.java:97)
        at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatusList(CreamJobMonitorAdaptor.java:38)
        at fr.in2p3.jsaga.engine.job.monitor.request.JobStatusRequestor.getJobStatus(JobStatusRequestor.java:34)
        at fr.in2p3.jsaga.engine.job.monitor.JobMonitorService.getState(JobMonitorService.java:78)
        at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.queryState(AbstractSyncJobImpl.java:248)
        at fr.in2p3.jsaga.impl.task.AbstractTaskImpl.getState(AbstractTaskImpl.java:243)
        at fr.in2p3.jsaga.impl.job.instance.JobImpl.getState(JobImpl.java:79)
        at fr.in2p3.jsaga.command.JobStatus.main(JobStatus.java:59)

tunde@schrift:~/SAGA/JSAGA/2010-08-07$ jsaga-job-info.sh [cream://creamce2.gina.sara.nl:8443/cream-pbs-short]-[CREAM148503979]
State:         FAILED
Exit code:     [not supported for this backend]
Failure cause: reason=1; Cannot move OSB (${globus_transfer_cmd} 
file:///data/home/pvi041/home_crm02_148503979/CREAM148503979/err.txt 
gsiftp://creamce2.gina.sara.nl:2811/tmp/1281356208480/err.txt): 
error: globus_ftp_client: the server responded with an error 500 500-Command failed.:
globus_l_gfs_file_open failed.  500-globus_xio: Unable to open file /tmp/1281356208480/err.txt
 500-globus_xio: System error in open: No such file or directory  500-globus_xio: 
A system call failed: No such file or directory  500 End.
Created time:  [not supported for this backend]
Started time:  [not supported for this backend]
Finished time: [not supported for this backend]
Execution hosts:
        [not supported for this backend]

As far as I can see the error is similar...
Best regards,
Tünde

RE: Job output Cream CE - Added by Reynaud Sylvain about 14 years ago

Hi Tünde,

In order to be sure that the same security context is used for creating the directory and for submitting the job, can you please try to add the following lines inside your security context (in jsaga-universe.xml):
<job type="cream"/>
<data type="gsiftp"/>

As I told you this morning on another thread (https://forge.in2p3.fr/boards/11/topics/show/161), in a next version of JSAGA this will be systematically required in order to avoid this kind of situation...

Best regards,
Sylvain

RE: Job output Cream CE - Added by Balint Tunde about 14 years ago

Hi Sylvain,
I already had those line in my jsaga-universe.xml. This looks like:

    <GRID name="glite" contextType="VOMS">
        <attribute name="USERCERT" value="/home/tunde/.globus1/usercert.pem"/>
        <attribute name="USERKEY" value="/home/tunde/.globus1/userkey.pem"/>
        <attribute name="UserProxy" value="/tmp/x509up_u1000"/>
        <attribute name="Server" value="voms://voms.grid.sara.nl:30000/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl"/>
        <attribute name="UserVO" value="pvier"/>
        <attribute name="CertRepository" value="/home/tunde/.globus/certificates/"/>
        <attribute name="VomsDir" value="/home/tunde/.globus/vomsdir"/>
        <job type="cream"/>
        <data type="gsiftp"/>
    </GRID>

I also had <job type="wms"/> written in it, but now I removed it, but I still got the same error. I also tried specifying the OutputStorage:
        <job type="cream">
                 <attribute name="OutputStorage" value="/tmp"/>
        </job> 

Best regards,
Tünde

RE: Job output Cream CE - Added by Reynaud Sylvain about 14 years ago

Hi Tünde,

I am unable to reproduce this error, even by using the same command line arguments and the same CE as you...

I currently have no idea why you have this error, but here are some ideas that you can try in order to get more information about this problem:
  • try to submit the job without the option -b in order to see if you get some stack-trace that would give more information to understand this problem.
  • try to submit a job that sleeps a few minutes to let you the time to check if the sandbox directory is created or not.

Best regards,
Sylvain

RE: Job output Cream CE - Added by Balint Tunde about 14 years ago

Hi Sylvain,
Unfortunately that didn't help. Running without -b gives the same error...
I also tryed to submit a script, which would sleep and then do something...but in this case the input file couldn't even be staged in:

tunde@schrift:~/SAGA/JSAGA/2010-08-07$ jsaga-job-run.sh -Executable /bin/sh \
-Arguments run.sh -Output out.txt -Error err.txt -WorkingDirectory /tmp \
-FileTransfer 'run.sh\>run.sh,/tmp/out2.txt\<out.txt,/tmp/err2.txt\<err.txt' \
-r cream://stremsel.nikhef.nl:8443/cream-pbs-short
Exception in thread "main" NoSuccess: Unexpected exception
        at fr.in2p3.jsaga.impl.file.copy.FileCopy.createTargetFile(FileCopy.java:188)
        at fr.in2p3.jsaga.impl.file.copy.FileCopy.putToPhysicalFile(FileCopy.java:130)
        at fr.in2p3.jsaga.impl.file.copy.FileCopy.copy(FileCopy.java:98)
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileImpl._copyAndMonitor(AbstractSyncFileImpl.java:190)
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileImpl.copySync(AbstractSyncFileImpl.java:166)
        at fr.in2p3.jsaga.impl.namespace.AbstractNSEntryImpl.copy(AbstractNSEntryImpl.java:154)
        at fr.in2p3.jsaga.impl.job.staging.mgr.DataStagingManagerThroughSandbox.
transfer(DataStagingManagerThroughSandbox.java:81)
        at fr.in2p3.jsaga.impl.job.staging.mgr.DataStagingManagerThroughSandboxTwoPhase.
preStaging(DataStagingManagerThroughSandboxTwoPhase.java:49)
        at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.doSubmit(AbstractSyncJobImpl.java:212)
        at fr.in2p3.jsaga.impl.task.AbstractTaskImpl.run(AbstractTaskImpl.java:101)
        at fr.in2p3.jsaga.impl.job.instance.JobImpl.run(JobImpl.java:43)
        at fr.in2p3.jsaga.command.JobRun.main(JobRun.java:88)
Caused by: DoesNotExist: Failed to create parent directory
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileImpl.tryToOpen(AbstractSyncFileImpl.java:105)
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileImpl.init(AbstractSyncFileImpl.java:65)
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileImpl.<init>(AbstractSyncFileImpl.java:41)
        at fr.in2p3.jsaga.impl.file.AbstractAsyncFileImpl.<init>(AbstractAsyncFileImpl.java:32)
        at fr.in2p3.jsaga.impl.file.FileImpl.<init>(FileImpl.java:30)
        at fr.in2p3.jsaga.impl.file.AbstractSyncFileFactoryImpl.doCreateFileSync
(AbstractSyncFileFactoryImpl.java:57)
        at fr.in2p3.jsaga.impl.file.FileFactoryImpl.doCreateFile(FileFactoryImpl.java:35)
        at org.ogf.saga.file.FileFactory.createFile(FileFactory.java:315)
        at fr.in2p3.jsaga.impl.file.copy.FileCopy.createTargetFile(FileCopy.java:186)
        ... 11 more
Caused by: org.globus.ftp.exception.ServerException: Server refused performing the request. 
Custom message:  (error code 1) [Nested exception message:  Custom message: 
Unexpected reply: 500-Command failed. : globus_l_gfs_file_open failed.
500-globus_xio: Unable to open file /tmp/1281472229242//run.sh
500-globus_xio: System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or directory
500 End.] [Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException:  
Custom message: Unexpected reply: 500-Command failed. : globus_l_gfs_file_open failed.
500-globus_xio: Unable to open file /tmp/1281472229242//run.sh
500-globus_xio: System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or directory
500 End.]
        at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java:101)
        at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java:110)
        at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195)
        at java.lang.Thread.run(Thread.java:619)

Are there any configuration files that I should change? Or is there any other way to redirect the in/output sandbox?
Best regards,
Tünde

RE: Job output Cream CE - Added by Reynaud Sylvain about 14 years ago

Hi Tünde,

You can try with latest snapshot in order to be sure that we have the same code. I am sorry but I have no idea about how to reproduce this problem in order to understand it!

Of course you could manage the in/output sandbox by yourself with the help of the JSAGA data management packages, but then your code would different for submitting jobs to CREAM than for submitting jobs to other systems...

Best regards,
Sylvain

RE: Job output Cream CE - Added by Balint Tunde about 14 years ago

Hi Sylvain,
I downloaded the source from 2010-08-09 and with that it worked...
I was working with a source from 2010-07-27.
Thank you!
Best regards,
Tünde

    (1-8/8)