Project

General

Profile

Actions

Bug #2490

closed

problem with certificates directory

Added by Guterl Patrick about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
gLite adaptors
Target version:
-
Start date:
02/15/2012
Due date:
% Done:

0%

Estimated time:

Description

I discovered a problem if on the same host a package glite and jsaga are installed
Glite need the directory /etc/grid-security/certificates for the authentification step.
But when a java program used the library jsaga and you will conneted to a CREAM Ce node
the java program create exception with the message: heap space / GC limit exceed
for the instruction
final Job job = service.createJob(m_job.getDesc());

It"s are not simple to etablish the link between error
java and the directory /etc/grid-security/certificates
i have removed the directory and it's work fine
only CE Cream node have the problem

I already tried many JVM options without success
the directory /etc/grid-security/certificates is a copy from the diretory used by jsaga attribute name="CertRepository"


Files

TestJobRun.java (12.4 KB) TestJobRun.java Guterl Patrick, 02/22/2012 10:59 AM
Actions #1

Updated by Schwarz Lionel about 13 years ago

  • Category set to gLite adaptors
  • Status changed from New to Assigned
  • Assigned To set to Schwarz Lionel

What type of security context are you using to connect to Cream? VOMS or MyProxy?

Actions #2

Updated by Schwarz Lionel about 13 years ago

  • Assigned To changed from Schwarz Lionel to Guterl Patrick

I could not reproduce this problem with a VOMS context.

Patrick, could you please give me more information:
- how many certificates (*.0) are in your /etc/grid-security/certificates directory?
- what is your security context? with which attributes? you can use the jsaga-context-info command
- what is the target Cream CE?
- do you have a more detailed stacktrace?

Thanks
Lionel

Actions #3

Updated by Guterl Patrick about 13 years ago

i use the command jsaga-context.init.sh
i have try this morningif the directory /etc/security-grid/certificate is présent the exception error is generated

the jsaga-default-context

<jsaga-default xmlns="http://www.in2p3.fr/jsaga/session">
<contexts>
<context type="VOMS">
<data type="gsiftp"/>
<data type="srm"/>
<data type="lfn"/>
<job type="wms"/>
<job type="cream"/>
<job type="gk"/>
</context>
</contexts>
<session>

<context type="VOMS" id="biomed">
<attribute name="Server" value="voms://cclcgvomsli01.in2p3.fr:15000/O=GR
ID-FR/C=FR/O=CNRS/OU=CC-IN2P3/CN=cclcgvomsli01.in2p3.fr"/>
<attribute name="VomsDir" value="etc/vomsdir"/>
<attribute name="UserVO" value="biomed"/>
<attribute name="UserProxy" value="/home/dsa/.globus/biomed.txt"/>
<attribute name="CertRepository" value="/home/dsa/.globus/certificates"/

<attribute name="UserCert" value="/home/dsa/.globus/usercert.pem"/>
<attribute name="UserKey" value="/home/dsa/.globus/userkey.pem"/>
<attribute name="UserPass" value="xxxxxxxx"/>
<attribute name="LifeTime" value="PT17H"/>
</context>

<!--
<context type="MyProxy">
<attribute name="Server" value="cclcgproxli01.in2p3.fr:7512"/>
</context>
-->
</session>
</jsaga-default>

Actions #4

Updated by Guterl Patrick about 13 years ago

what is the target Cream CE?
any CE
- do you have a more detailed stacktrace?

timer 24 -java.lang.outofmemoryerror: java heap space
GC overhead limit exceeded

The exception error begin after 2/3 mins run program not immediately

Actions #5

Updated by Schwarz Lionel about 13 years ago

what happens if your /etc/grid-security/certificates is empty?

Actions #6

Updated by Guterl Patrick about 13 years ago

/etc/grid-security/certificates empty
program work fine
next step i copied the directory .globus/certificate

Exception in thread "pool-1-thread-8" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Vector.<init>(Vector.java:111)
at java.util.Vector.<init>(Vector.java:124)
at java.util.Vector.<init>(Vector.java:133)
at org.bouncycastle.asn1.x509.X509Name.<init>(Unknown Source)
at org.bouncycastle.jce.X509Principal.<init>(Unknown Source)
at org.bouncycastle.jce.provider.X509CertificateObject.getSubjectDN(Unknown Source)
at org.glite.security.trustmanager.ContextWrapper.checkCRLs(ContextWrapper.java:729)
at org.glite.security.trustmanager.ContextWrapper.updateCRLs(ContextWrapper.java:658)
at org.glite.security.trustmanager.ContextWrapper.startCRLLoop(ContextWrapper.java:615)
at org.glite.security.trustmanager.ContextWrapper.init(ContextWrapper.java:424)
at org.glite.security.trustmanager.ContextWrapper.<init>(ContextWrapper.java:246)
at org.glite.security.trustmanager.axis.AXISSocketFactory.create(AXISSocketFactory.java:83)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:191)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at org.glite.ce.creamapi.ws.cream2.CreamBindingStub.jobInfo(CreamBindingStub.java:1257)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobControlAdaptor.getJobInfo(CreamJobControlAdaptor.java:156)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobControlAdaptor.getOutputStagingTransfer(CreamJobControlAdaptor.java:143)
at fr.in2p3.jsaga.impl.job.staging.mgr.DataStagingManagerThroughSandbox.postStaging(DataStagingManagerThroughSandbox.java:39)
at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.postStaging(AbstractSyncJobImpl.java:375)
at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.postStagingAndCleanup(AbstractSyncJobImpl.java:363)
at fr.iphc.grid.command.GetOutputThread.run(GetOutputThread.java:19)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at org.bouncycastle.jce.provider.PEMUtil.readPEMObject(Unknown Source)
at org.bouncycastle.jce.provider.JDKX509CertificateFactory.readPEMCertificate(Unknown Source)
at org.bouncycastle.jce.provider.JDKX509CertificateFactory.engineGenerateCertificate(Unknown Source)
at java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:305)
at org.glite.security.util.FileCertReader.readObject(FileCertReader.java:360)
at org.glite.security.util.FileCertReader.objectReader(FileCertReader.java:322)
at org.glite.security.util.FileCertReader.readFile(FileCertReader.java:260)
at org.glite.security.util.FileCertReader.readFiles(FileCertReader.java:230)
at org.glite.security.util.FileCertReader.readAnchors(FileCertReader.java:164)
at org.glite.security.trustmanager.ContextWrapper.initTrustAnchors(ContextWrapper.java:569)
at org.glite.security.trustmanager.ContextWrapper.init(ContextWrapper.java:406)
at org.glite.security.trustmanager.ContextWrapper.<init>(ContextWrapper.java:246)
at org.glite.security.trustmanager.axis.AXISSocketFactory.create(AXISSocketFactory.java:83)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:191)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at org.glite.ce.creamapi.ws.cream2.CreamBindingStub.jobInfo(CreamBindingStub.java:1257)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatus(CreamJobMonitorAdaptor.java:55)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatusList(CreamJobMonitorAdaptor.java:38)
at fr.in2p3.jsaga.engine.job.monitor.request.JobStatusRequestor.getJobStatus(JobStatusRequestor.java:34)

Exception in thread "pool-1-thread-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-9" java.lang.OutOfMemoryError: GC overhead limit exceeded

Actions #7

Updated by Guterl Patrick about 13 years ago

at fr.iphc.grid.command.GetOutputThread.run(GetOutputThread.java:19)
instruction line
((JobImpl) m_job.getJob()).postStagingAndCleanup();

Actions #8

Updated by Schwarz Lionel about 13 years ago

what is the total size of all CA certificates? could you run:
du -chL *.0| tail -1
and ls *.0 | wc -l

Actions #9

Updated by Guterl Patrick about 13 years ago

du -chL *.0|tail -1
968K total
ls *.0 | wc -l
242

Actions #10

Updated by Schwarz Lionel about 13 years ago

how often does your application request for job status? If this operation is too frequent, this might be an issue
as the GC has not enough time to drop dead object allocated by the underlying security layer.
I think the default value is 1 second, which I think is not reasonnable. Try a higher value by setting

job.monitor.poll.period=60000

for a poll period of 1 minute.

Actions #11

Updated by Guterl Patrick about 13 years ago

i tried to reproduce the bug

use code at
http://grid.in2p3.fr/jsaga/jsaga-engine/xref/fr/in2p3/jsaga/comman/JobRun.html
-with args : -Executable /bin/hostname -r cream://prabi-ce3.ibcp.fr:8443/cream-pbs-sdj -b

a single run it'work (with /without the directory /etc/grid-security/certificates)

-add loop from job creation
open at line 75, for(i=0;i<Nb_loop;i++) {
close just before thevSystem.exit(0) line 144)

-works without certificates in /etc/grid-security/certificates (200 jobs)
-copy .globus/certificates to /etc/grid-security
-retry => Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

refine the test
-exception thrown on 20th job
-job 0-17 "fast"
-job 18 slow
-job 19 exception thrown

Actions #12

Updated by Schwarz Lionel about 13 years ago

Thanks for your explanation, I will try to reproduce it this way.
However, I am pretty sure that the limit is reached because of too frequent monitoring polls.
Did you try to set the "job.monitor.poll.period" to "60000" ?

This does not explain the difference you see (but not me) between /etc/grid-security/certificates and .globus/certificates. Maybe access to
/etc/grid-security/certificates is a bit slower...

Lionel

Actions #13

Updated by Guterl Patrick about 13 years ago

we have installed on a host a new debian linux system and download only jsaga for the test.
it s the same directory for etc/grid-security/certificates and .globus/certificates
why should access to etc/grid-security/certificates slower ?

Actions #14

Updated by Guterl Patrick about 13 years ago

We have changed the polling value to 6000 : same problem

remove .globus/certificates directory and configure jsaga-default-context.xml field CertRepository: /etc/grid-security/certificates

same problem
Actions #15

Updated by Schwarz Lionel about 13 years ago

OK but could you please use the same period as me : 60000ms ? so that I can check if the error I could get is the same as yours

Lionel

Actions #16

Updated by Guterl Patrick about 13 years ago

find below the paramters of the config file jsaga-config.properties with generated the error
jsaga/etc # sed '/^\#/d' jsaga-config.properties | sed '/^$/d'
jsaga.default.contexts=etc/jsaga-default-contexts.xml
jsaga.timeout=etc/jsaga-timeout.properties
log4j.configuration=etc/log4j.properties
jsaga.default.contexts.check.conflicts=true
data.implicit.close.timeout=-1
data.copy.buffer.size=16384
data.attributes.cache.lifetime=60000
job.description.default=etc/jsaga-default.jsdl
job.monitor.poll.period=60000
job.monitor.error.threshold=3
job.control.check.availability=false
job.control.check.match=false
job.cancel.check.status=true

the problem appears always with cream node
Actions #17

Updated by Schwarz Lionel about 13 years ago

Patrick, we suspect the underlying globus layer use '/etc/grid-security/certificates' , that would conflict with
JSAGA configuration '.globus/certificates'.
Could your try this:
Move your certificates into "/etc/grid-security/certificates"
setup "/etc/grid-security/certificates" in your JSAGA configuration

and let me know if you still have the GC issue.

Lionel

Actions #18

Updated by Schwarz Lionel about 13 years ago

Could you try to set this system property:
crlEnabled=false
It seems the issue comes from the CRLs and not the CA certificates.

Lionel

Actions #19

Updated by Guterl Patrick about 13 years ago

EUREKA
With this command line insert after first line main program

System.setProperty("crlEnabled", "false");

it's work OK

in which configuration/property files it s possible to insert the property ?

Actions #20

Updated by Schwarz Lionel about 13 years ago

  • Status changed from Assigned to Resolved
  • Assigned To changed from Guterl Patrick to Schwarz Lionel

add option -DcrlEnabled=false to the command line.

I will add this property directly in the JSAGA code (available in next release 0.9.15), so that you do not have to worry about this system property

Lionel

Actions

Also available in: Atom PDF