Project

General

Profile

gLite Adaptor - Register job failed to LB server

Added by Scardaci Diego about 14 years ago

Dear All,
Using JSAGA I developed a multi-thread application able to submit several jobs in a parallel way.
I observerd a problem when my application submits all the jobs (for example a set of 50 jobs submitted in parallel) to the same WMS (or a small set of WMS). I copied the error at the end of the mail.
This problem is not deterministic (sometimes works sometimes no) and the error rate raises increasing the number of parallel jobs.
The error disappears adding a delay between each submission.
The error disappears using a big set of WMS (more than 10) to submit the jobs in parallel.

Could be a race condition inside the gLite Adaptor?

Thanks in advance for your help.

Cheers,
Diego Scardaci

Register job failed to LB server: lb2.eela.ufrj.br:9000
edg_wll_RegisterJobProxy/Sync
Exit code: 1416
LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
(edg_wll_RegisterJobProxy(): unable to register with bkserver
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEventDirect(): edg_wll_log_direct_connect error
GSSAPI Error;; edg_wll_gss_acquire_cred_gsi(): failed to load GSI credentials: GSS Major Status: General failure
(GSS Minor Status Error Chain:
globus_gsi_gssapi: Unable to read credential for import
globus_gsi_gssapi: Error with gss credential handle
globus_gsi_gssapi: Error with openssl: Couldn't set the private key to be used for the SSL context
OpenSSL Error: x509_cmp.c:389: in library: x509 certificate routines, function X509_check_private_key: key values mismatch
))
Method: jobRegister
TimeStamp: Wedn Nov 22 2011 16:17:12 GMT
ErrorCode: 1227
Cause: LBException: LBProxy is enabled
Register job failed to LB server: lb2.eela.ufrj.br:9000
edg_wll_RegisterJobProxy/Sync
Exit code: 1416
LB[Proxy] Error: LB server (bkserver,lbproxy) store protocol error
(edg_wll_RegisterJobProxy(): unable to register with bkserver
LB server (bkserver,lbproxy) store protocol error;; Logging library ERROR:
LB server (bkserver,lbproxy) store protocol error;; edg_wll_DoLogEventDirect(): edg_wll_log_direct_connect error
GSSAPI Error;; edg_wll_gss_acquire_cred_gsi(): failed to load GSI credentials: GSS Major Status: General failure
(GSS Minor Status Error Chain:
globus_gsi_gssapi: Unable to read credential for import
globus_gsi_gssapi: Error with gss credential handle
globus_gsi_gssapi: Error with openssl: Couldn't set the private key to be used for the SSL context
OpenSSL Error: x509_cmp.c:389: in library: x509 certificate routines, function X509_check_private_key: key values mismatch
))
at registerJob()[wmpeventlogger.cpp:373]
at registerJob()[wmpeventlogger.cpp:325]
at regist()[wmpcoreoperations.cpp:737]
at jobregister()[wmpcoreoperations.cpp:293]
at jobRegister()[wmpcoreoperations.cpp:382] |#]


Replies (15)

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Hi Diego,

Adding a delay between each submission on the same job service is a good practice, and not only for WMS.

However, the API whould not be the right place for doing that because we would limit the performance for users who have installed their own WMS, or for potential futur improvements of its scalability for example. So this would be better in application code than in API code IMHO.

However, any knowledge about good practices for using the EGI infrastructure is welcome, so please feel free to create a page in JSAGA wiki (https://forge.in2p3.fr/projects/jsaga/wiki) if you think that you can share good practices with other JSAGA users (e.g. max number of simultaneous job submissions, minimum delay between job submissions...).

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

Hi Sylvain,
then do you think it's a WMS problem that isn't able to manage my submission throughput?

Can we exclude a problem on the gLite Adaptor for JSAGA?

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Yes, I think... at least the entire error message comes from server side.

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

Hi Sylvain,
I tried to replicate the same submission rate from a gLite UI and I didn't get the same error.

Looking the error I got with JSAGA:
OpenSSL Error: x509_cmp.c:389: in library: x509 certificate routines, function X509_check_private_key: key values mismatch

I thought it could be depend on an error in proxy transfer. Maybe it could be some race condition when many jobs are submitted in the same.

What do you think about that?

Thanks in advance,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Hi Diego,

If you create a single JobService instance per CE and use it to submit several jobs, then your proxy is sent only once for all jobs submitted to this CE. So in this case, I don't think that your problem is related to an error in proxy transfer.

But if you concurrently create several JobService instances for the same CE, then maybe you could fall into some situation where the proxy is being overwritten for the 2nd job while the 1st one is trying to execute... I don't know how this is handled by the CREAM-CE, but anyway using JSAGA this way is not optimal, so it is not recommended whatever adaptor you use.

Do you have one or several JobService instances per CE in your code ?

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

Hi Sylvan,
maybe this could be the problem...

I create a JobService for each job submission:
...
JobService service = JobFactory.createJobService(session, serviceURL);
job = service.createJob(desc);
...

I don't specify the CE but only the Resource Manager (WMS), then I leave the WMS free to choise the CE where sending the job (I'm usin the WMS adaptor).

Then, could I fix this probem creating the JobService only one time for each WMS?

If yes, using a unique JobService object for WMS, can I submit several jobs in a parallel way (several thread running in the same time) using the same JobService object without create new race conditions?

Thanks again.

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

I didn't read the beginning of the discussion, I though that you were submitting your jobs directly to CREAM-CE and that's why I was talking about CREAM-CE instead of WMS, but exactly the same rule apply to WMS : you should create only one JobService per WMS.

I think this could solve your problem, and I am sure this will be more efficient anyway.

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

and I could submit jobs in a parallel way using the same JobService?

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Yes, several JSAGA users are doing it this way.

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

Thanks a lot for your help Sylvain!

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

thinking again on the problem I'm not sure about another point...

I create a JobService using a Session and a URL:

JobService service = JobFactory.createJobService(session, serviceURL);

then, according to your suggestion, I have to do this operation only time.

Considering a scenario with several users submitting jobs to the same WMS then ... do I have to create only one session shared between all users!?

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Thanks for pointing this : the exact recommandation is not to have one JobService per CE, but one JobService per CE/user pair (because each JobService instance is associated to a user).

Cheers,
Sylvain

RE: gLite Adaptor - Register job failed to LB server - Added by Scardaci Diego about 14 years ago

Ok.

And I suppose that users are identified by their certificates/proxies...

I'm using robot certificate then, in my case, do I have to share the same JobService (then the same session) for all users using the same Robot Certificate?

Cheers,
Diego

RE: gLite Adaptor - Register job failed to LB server - Added by Reynaud Sylvain about 14 years ago

Yes, when I say one JobService per CE/user pair, I mean one JobService per CE/DN pair.

So if you use the same certificate for all users, you should also use the same JobService for them.

Cheers,
Sylvain

    (1-15/15)