Bug #2490
closedproblem with certificates directory
0%
Description
I discovered a problem if on the same host a package glite and jsaga are installed
Glite need the directory /etc/grid-security/certificates for the authentification step.
But when a java program used the library jsaga and you will conneted to a CREAM Ce node
the java program create exception with the message: heap space / GC limit exceed
for the instruction
final Job job = service.createJob(m_job.getDesc());
It"s are not simple to etablish the link between error
java and the directory /etc/grid-security/certificates
i have removed the directory and it's work fine
only CE Cream node have the problem
I already tried many JVM options without success
the directory /etc/grid-security/certificates is a copy from the diretory used by jsaga attribute name="CertRepository"
Files
Updated by Schwarz Lionel about 13 years ago
- Category set to gLite adaptors
- Status changed from New to Assigned
- Assigned To set to Schwarz Lionel
What type of security context are you using to connect to Cream? VOMS or MyProxy?
Updated by Schwarz Lionel about 13 years ago
- Assigned To changed from Schwarz Lionel to Guterl Patrick
I could not reproduce this problem with a VOMS context.
Patrick, could you please give me more information:
- how many certificates (*.0) are in your /etc/grid-security/certificates directory?
- what is your security context? with which attributes? you can use the jsaga-context-info command
- what is the target Cream CE?
- do you have a more detailed stacktrace?
Thanks
Lionel
Updated by Guterl Patrick about 13 years ago
i use the command jsaga-context.init.sh
i have try this morningif the directory /etc/security-grid/certificate is présent the exception error is generated
the jsaga-default-context
<jsaga-default xmlns="http://www.in2p3.fr/jsaga/session">
<contexts>
<context type="VOMS">
<data type="gsiftp"/>
<data type="srm"/>
<data type="lfn"/>
<job type="wms"/>
<job type="cream"/>
<job type="gk"/>
</context>
</contexts>
<session>
<context type="VOMS" id="biomed">
<attribute name="Server" value="voms://cclcgvomsli01.in2p3.fr:15000/O=GR
ID-FR/C=FR/O=CNRS/OU=CC-IN2P3/CN=cclcgvomsli01.in2p3.fr"/>
<attribute name="VomsDir" value="etc/vomsdir"/>
<attribute name="UserVO" value="biomed"/>
<attribute name="UserProxy" value="/home/dsa/.globus/biomed.txt"/>
<attribute name="CertRepository" value="/home/dsa/.globus/certificates"/
<attribute name="UserCert" value="/home/dsa/.globus/usercert.pem"/>
<attribute name="UserKey" value="/home/dsa/.globus/userkey.pem"/>
<attribute name="UserPass" value="xxxxxxxx"/>
<attribute name="LifeTime" value="PT17H"/>
</context>
<!--
<context type="MyProxy">
<attribute name="Server" value="cclcgproxli01.in2p3.fr:7512"/>
</context>
-->
</session>
</jsaga-default>
Updated by Guterl Patrick about 13 years ago
what is the target Cream CE?
any CE
- do you have a more detailed stacktrace?
timer 24 -java.lang.outofmemoryerror: java heap space
GC overhead limit exceeded
The exception error begin after 2/3 mins run program not immediately
Updated by Schwarz Lionel about 13 years ago
what happens if your /etc/grid-security/certificates is empty?
Updated by Guterl Patrick about 13 years ago
/etc/grid-security/certificates empty
program work fine
next step i copied the directory .globus/certificate
Exception in thread "pool-1-thread-8" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Vector.<init>(Vector.java:111)
at java.util.Vector.<init>(Vector.java:124)
at java.util.Vector.<init>(Vector.java:133)
at org.bouncycastle.asn1.x509.X509Name.<init>(Unknown Source)
at org.bouncycastle.jce.X509Principal.<init>(Unknown Source)
at org.bouncycastle.jce.provider.X509CertificateObject.getSubjectDN(Unknown Source)
at org.glite.security.trustmanager.ContextWrapper.checkCRLs(ContextWrapper.java:729)
at org.glite.security.trustmanager.ContextWrapper.updateCRLs(ContextWrapper.java:658)
at org.glite.security.trustmanager.ContextWrapper.startCRLLoop(ContextWrapper.java:615)
at org.glite.security.trustmanager.ContextWrapper.init(ContextWrapper.java:424)
at org.glite.security.trustmanager.ContextWrapper.<init>(ContextWrapper.java:246)
at org.glite.security.trustmanager.axis.AXISSocketFactory.create(AXISSocketFactory.java:83)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:191)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at org.glite.ce.creamapi.ws.cream2.CreamBindingStub.jobInfo(CreamBindingStub.java:1257)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobControlAdaptor.getJobInfo(CreamJobControlAdaptor.java:156)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobControlAdaptor.getOutputStagingTransfer(CreamJobControlAdaptor.java:143)
at fr.in2p3.jsaga.impl.job.staging.mgr.DataStagingManagerThroughSandbox.postStaging(DataStagingManagerThroughSandbox.java:39)
at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.postStaging(AbstractSyncJobImpl.java:375)
at fr.in2p3.jsaga.impl.job.instance.AbstractSyncJobImpl.postStagingAndCleanup(AbstractSyncJobImpl.java:363)
at fr.iphc.grid.command.GetOutputThread.run(GetOutputThread.java:19)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at org.bouncycastle.jce.provider.PEMUtil.readPEMObject(Unknown Source)
at org.bouncycastle.jce.provider.JDKX509CertificateFactory.readPEMCertificate(Unknown Source)
at org.bouncycastle.jce.provider.JDKX509CertificateFactory.engineGenerateCertificate(Unknown Source)
at java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:305)
at org.glite.security.util.FileCertReader.readObject(FileCertReader.java:360)
at org.glite.security.util.FileCertReader.objectReader(FileCertReader.java:322)
at org.glite.security.util.FileCertReader.readFile(FileCertReader.java:260)
at org.glite.security.util.FileCertReader.readFiles(FileCertReader.java:230)
at org.glite.security.util.FileCertReader.readAnchors(FileCertReader.java:164)
at org.glite.security.trustmanager.ContextWrapper.initTrustAnchors(ContextWrapper.java:569)
at org.glite.security.trustmanager.ContextWrapper.init(ContextWrapper.java:406)
at org.glite.security.trustmanager.ContextWrapper.<init>(ContextWrapper.java:246)
at org.glite.security.trustmanager.axis.AXISSocketFactory.create(AXISSocketFactory.java:83)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:191)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at org.glite.ce.creamapi.ws.cream2.CreamBindingStub.jobInfo(CreamBindingStub.java:1257)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatus(CreamJobMonitorAdaptor.java:55)
at fr.in2p3.jsaga.adaptor.cream.job.CreamJobMonitorAdaptor.getStatusList(CreamJobMonitorAdaptor.java:38)
at fr.in2p3.jsaga.engine.job.monitor.request.JobStatusRequestor.getJobStatus(JobStatusRequestor.java:34)
Exception in thread "pool-1-thread-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-9" java.lang.OutOfMemoryError: GC overhead limit exceeded
Updated by Guterl Patrick about 13 years ago
at fr.iphc.grid.command.GetOutputThread.run(GetOutputThread.java:19)
instruction line
((JobImpl) m_job.getJob()).postStagingAndCleanup();
Updated by Schwarz Lionel about 13 years ago
what is the total size of all CA certificates? could you run:
du -chL *.0| tail -1
and ls *.0 | wc -l
Updated by Guterl Patrick about 13 years ago
du -chL *.0|tail -1
968K total
ls *.0 | wc -l
242
Updated by Schwarz Lionel about 13 years ago
how often does your application request for job status? If this operation is too frequent, this might be an issue
as the GC has not enough time to drop dead object allocated by the underlying security layer.
I think the default value is 1 second, which I think is not reasonnable. Try a higher value by setting
job.monitor.poll.period=60000
for a poll period of 1 minute.
Updated by Guterl Patrick about 13 years ago
- File TestJobRun.java TestJobRun.java added
i tried to reproduce the bug
use code at
http://grid.in2p3.fr/jsaga/jsaga-engine/xref/fr/in2p3/jsaga/comman/JobRun.html
-with args : -Executable /bin/hostname -r cream://prabi-ce3.ibcp.fr:8443/cream-pbs-sdj -b
a single run it'work (with /without the directory /etc/grid-security/certificates)
-add loop from job creation
open at line 75, for(i=0;i<Nb_loop;i++) {
close just before thevSystem.exit(0) line 144)
-works without certificates in /etc/grid-security/certificates (200 jobs)
-copy .globus/certificates to /etc/grid-security
-retry => Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
refine the test
-exception thrown on 20th job
-job 0-17 "fast"
-job 18 slow
-job 19 exception thrown
Updated by Schwarz Lionel about 13 years ago
Thanks for your explanation, I will try to reproduce it this way.
However, I am pretty sure that the limit is reached because of too frequent monitoring polls.
Did you try to set the "job.monitor.poll.period" to "60000" ?
This does not explain the difference you see (but not me) between /etc/grid-security/certificates and .globus/certificates. Maybe access to
/etc/grid-security/certificates is a bit slower...
Lionel
Updated by Guterl Patrick about 13 years ago
we have installed on a host a new debian linux system and download only jsaga for the test.
it s the same directory for etc/grid-security/certificates and .globus/certificates
why should access to etc/grid-security/certificates slower ?
Updated by Guterl Patrick about 13 years ago
We have changed the polling value to 6000 : same problem
remove .globus/certificates directory and configure jsaga-default-context.xml field CertRepository: /etc/grid-security/certificates
same problem
Updated by Schwarz Lionel about 13 years ago
OK but could you please use the same period as me : 60000ms ? so that I can check if the error I could get is the same as yours
Lionel
Updated by Guterl Patrick about 13 years ago
find below the paramters of the config file jsaga-config.properties with generated the error
jsaga/etc # sed '/^\#/d' jsaga-config.properties | sed '/^$/d'
jsaga.default.contexts=etc/jsaga-default-contexts.xml
jsaga.timeout=etc/jsaga-timeout.properties
log4j.configuration=etc/log4j.properties
jsaga.default.contexts.check.conflicts=true
data.implicit.close.timeout=-1
data.copy.buffer.size=16384
data.attributes.cache.lifetime=60000
job.description.default=etc/jsaga-default.jsdl
job.monitor.poll.period=60000
job.monitor.error.threshold=3
job.control.check.availability=false
job.control.check.match=false
job.cancel.check.status=true
the problem appears always with cream node
Updated by Schwarz Lionel about 13 years ago
Patrick, we suspect the underlying globus layer use '/etc/grid-security/certificates' , that would conflict with
JSAGA configuration '.globus/certificates'.
Could your try this:
Move your certificates into "/etc/grid-security/certificates"
setup "/etc/grid-security/certificates" in your JSAGA configuration
and let me know if you still have the GC issue.
Lionel
Updated by Schwarz Lionel about 13 years ago
Could you try to set this system property:
crlEnabled=false
It seems the issue comes from the CRLs and not the CA certificates.
Lionel
Updated by Guterl Patrick about 13 years ago
EUREKA
With this command line insert after first line main program
System.setProperty("crlEnabled", "false");
it's work OK
in which configuration/property files it s possible to insert the property ?
Updated by Schwarz Lionel about 13 years ago
- Status changed from Assigned to Resolved
- Assigned To changed from Guterl Patrick to Schwarz Lionel
add option -DcrlEnabled=false to the command line.
I will add this property directly in the JSAGA code (available in next release 0.9.15), so that you do not have to worry about this system property
Lionel