tracker issue | What iT iS dESign studios

Title:

Java Heap Space OutOfMemoryError using CFINDEX after installation of Update 12

| View in Tracker

Status/Resolution/Reason: Closed/Fixed/Fixed

Reporter/Name(from Bugbase): Eric Belair / Eric Belair ()

Created: 05/09/2017

Components: Performance

Versions: 11.0

Failure Type: Memory Leak

Found In Build/Fixed In Build: 11,0,12,302575 / CF11 update 13

Priority/Frequency: Normal / Few users will encounter

Locale/System: English / Win 2012 Server x64

Vote Count: 0

Problem Description:
I have a template file that runs as a CF scheduled task to index a collection of PDFs. It has run every day without incident for months. I installed ColdFusion 11 Update 12 last night, and this morning, the template through a Java Heap Space OutOfMemoryError.

Steps to Reproduce:

Call this template file (pseudo-code):

<cfsetting requesttimeout="3600" />

<cfset LOCAL = {} />

<cfquery name="qAllDocuments">
    SELECT DISTINCT
        ID,
        Status,
        'F:\PDFs\Processed\'
            CONCAT TRIM(DOCUMENTID)
            CONCAT '.PDF'   AS  DocumentFile,
        'No'                AS  FileExists
    FROM    Documents
</cfquery>

<cfloop query="qAllDocuments">
    <cfset qAllDocuments["FileExists"][CURRENTROW] =
        YesNoFormat(FileExists(qAllDocuments["DocumentFile"][CURRENTROW])) />
</cfloop>

<cfquery name="LOCAL.qDocuments" dbtype="Query">
    SELECT
        ID,
        Status,
        DocumentFile
    FROM    qAllDocuments
    WHERE   FileExists = 'Yes'
</cfquery>

<cfindex
    query="LOCAL.qDocuments"
    collection="MyDocCollection"
    action="refresh"
    type="file"
    key="DocumentFile"
    custom1="ID"
    custom2="Status" />

Actual Result:

Message: Java heap space
StackTrace: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Unknown Source)
	at java.lang.String.(Unknown Source)
	at java.lang.String.substring(Unknown Source)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.CMapResourceBuilder.splitToUnicodeSubSequence(CMapResourceBuilder.java:556)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.CMapResourceBuilder.parseToUnicodeMap(CMapResourceBuilder.java:295)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.PDFToUnicodeCMap.(PDFToUnicodeCMap.java:297)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.PDFToUnicodeCMap.getInstance(PDFToUnicodeCMap.java:332)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.PDFFontType0.getToUnicodeCMap(PDFFontType0.java:148)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.PDFFontType0.getCharCodes(PDFFontType0.java:345)
	at com.adobe.internal.pdftoolkit.pdf.graphics.font.PDFFontType0.getCharCodes(PDFFontType0.java:302)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextRun.getCharCodesFromFont(TextRun.java:672)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextRun.cacheHorizontalGlyphInfo(TextRun.java:754)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextRun.cacheGlyphInfo(TextRun.java:662)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextRun.init(TextRun.java:274)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextRun.(TextRun.java:147)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextObject.addTextRun(TextObject.java:96)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.TextObjectExtractor.Tj(TextObjectExtractor.java:942)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.contentprocessor.TextShowingOperator.process(ContentOperators.java:513)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.TextObjectExtractor.process(TextObjectExtractor.java:359)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.TextObjectExtractor.extractTextObjects(TextObjectExtractor.java:326)
	at com.adobe.internal.pdftoolkit.services.textextraction.impl.TextObjectExtractor.extractTextObjects(TextObjectExtractor.java:221)
	at com.adobe.internal.pdftoolkit.services.textextraction.TextExtractor.extractWords(TextExtractor.java:273)
	at com.adobe.internal.pdftoolkit.services.textextraction.TextExtractor.getWordsIterator(TextExtractor.java:465)
	at com.adobe.internal.pdftoolkit.services.textextraction.TextExtractor$DocumentWordsIterator.(TextExtractor.java:626)
	at com.adobe.internal.pdftoolkit.services.textextraction.TextExtractor.getWordsIterator(TextExtractor.java:392)
	at coldfusion.pdf.PDFDocHandler.extractTextString(PDFDocHandler.java:3566)
	at coldfusion.pdf.PDFDocHandler.extractText(PDFDocHandler.java:3533)
	at coldfusion.pdf.PDFDocOperation.extractText(PDFDocOperation.java:987)
	at coldfusion.tagext.search.SolrUtils.getSolrDocument(SolrUtils.java:732)
	at coldfusion.tagext.search.SolrUtils.addDocument(SolrUtils.java:1273)
	at coldfusion.tagext.search.IndexTag.doQueryUpdate(IndexTag.java:778)
	at coldfusion.tagext.search.IndexTag.doStartTag(IndexTag.java:351)


Expected Result:
Template runs without exception and CFINDEX updates the collection.

Any Workarounds:
None tried. Considering uninstalling update and/or adding memory.

Attachments:

June 23, 2017 00:00:00: 130903.PDF

Comments:

Eric,

Can you share some information about the approximate numbers and the sizes of PDF documents that you are trying to index.
Although it looks like it is cfindex is triggering it, just to be sure, in the complete stack trace that you've shared, do you see any references to the cfm page and the line no. which is triggering it. Can you pls. share the code extract from the line? 
In case you've uninstalled the update, did that fix the issue?

Comment by Piyush K.

779 | May 16, 2017 12:26:20 PM GMT

Hi, 

Thank you for the reply. It is trying to index about 16,000 files. The number of files increases slightly each day, so, there has not been a large increase in the number of files indexed.

The largest PDF is about 32MB. The smallest is about 2KB.

I don't have any more stack trace information available than what is in the original post. Based on the contents of the stack trace, it does not appear to be timing out on either of the CFQUERY tags or the CFLOOP. The entire code of the template being requested is in the original post.

I have not uninstalled the update yet.

Comment by Eric B.

780 | May 16, 2017 01:16:11 PM GMT

Eric,
Looks like the memory outage is occurring when the code hits the part in our PDF processing library.
I've tried indexing a couple of hundred PDFs with sizes up to 50 MB without running into Out-of-memory errors. So, I'm afraid I'm going need some more info.
Can you check the heap size allocated to the CF JVM. If it is a standalone server you'll find the setting in the jvm.conf file at <cf_root>/cfusion/bin dir, if it is deployed as a JEE application, it would a depend host App server setting.

Will it be possible for you to log the file that you are indexing so that you can see if the OO Memory error occurring after CF starts processing a certain PDF..
Something akin to:	

writeOutput("Indexing file ..." & fl_path & "<br>");
// or cflog("Indexing file ..." & fl_path & "<br>");
cfindex( action='refresh', collection=coln_name, type='file', key=fl_path, status="indx_stat", throwonError=false);
writeOutput("File " & IIF( indx_stat.inserted EQ 1, DE(""), DE(" <b>NOT</b> ")) & "indexed.<br>");
writeOutput("Is the error struct empty: " & "<b>" & StructIsEmpty(indx_stat.errors) & "</b><br>");

Comment by Piyush K.

781 | May 17, 2017 10:21:31 AM GMT

Hi,

Heap size settings from jvm.config: -Xms256m -Xmx2048m (these arguments were exactly the same before the update to CF11u12).

I am going to try running the code in a lower tier that is configured the same as the production tier to see I can reproduce the results there, and then will try to reproduce after uninstalling the update.

Thank you,

Eric

Comment by Eric B.

782 | May 17, 2017 03:59:32 PM GMT

@Eric, any update here?

Comment by Vamseekrishna N.

783 | May 23, 2017 08:12:40 AM GMT

Update 1: I just ran the script with the same set of files (16,000+) on a different server that is running ColdFusion 11 Update 11. It ran to completion without error. This confirms to me that this is an issue with Update 12 (or perhaps Update 11 is simply not throwing the error and continuing processing?). 

I am going to attempt to run the loop that Piyush suggested on a server running CF11u12 and see if it bombs on a specific file.

Comment by Eric B.

784 | June 22, 2017 06:22:08 PM GMT

@Piyush, your code sample does not work for me. It first threw the following exception: "throwonerror is an invalid custom field name". I removed the throwonerror attribute, and then it threw the following exception: "Element ERRORS is undefined in INDX_STAT". Can you confirm that the syntax of your code sample is compatible with the version of ColdFusion I am using? Thank you...

Comment by Eric B.

785 | June 22, 2017 06:44:14 PM GMT

@Piyush, nevermind, I simply removed the throwonerror attribute and added a StructKeyExists(indx_stat, "errors") before attempting to access it.

Comment by Eric B.

786 | June 22, 2017 06:49:29 PM GMT

Now running the following script using the same set of files on my local PC with CF12u12:

    LOCAL.Directory = "C:\Docs";

    LOCAL.aFiles =
        DirectoryList(LOCAL.Directory, false, "name", "*.pdf", "name asc", "file");

    for (LOCAL.i=1; LOCAL.i<=ArrayLen(LOCAL.aFiles); LOCAL.i++) {
        indx_stat = {};
        LOCAL.FileName = LOCAL.aFiles[LOCAL.i];
        LOCAL.FilePath = LOCAL.Directory & "\" & LOCAL.FileName;

        writeLog(LOCAL.i & "; File Name: " & LOCAL.FileName & "; Indexing...");

        cfindex(
            action="refresh",
            collection="docstest",
            type="file",
            key=LOCAL.FilePath,
            status="indx_stat"
            );

        LOCAL.Messages = (StructKeyExists(indx_stat, "MESSAGES") ? ArrayToList(indx_stat.MESSAGES) : "");

        writeLog(LOCAL.i & "; File Name: " & LOCAL.FileName & "; Indexed: " & YesNoFormat(indx_stat.inserted) & "; Messages: " & LOCAL.Messages);
    }

Comment by Eric B.

787 | June 22, 2017 09:06:13 PM GMT

So I let the script above run. It went through 2,450 files - most were indexed, some were not. For example, a few files had the message "...check the exception for more details: An error occurred during EXTRACTTEXT operation in <CFPDF/>". But, despite those messages, it kept chugging along, indexing each file in about 3 seconds. Then, it hit the attached file - 130903.PDF - and after 3 minutes and 12 seconds, it bombed with the Java heap space error. I removed the file and attempted to run the original template again, and it still bombed with a Java heap space error. To repeat, this DOES NOT happen with CF11u11.

Comment by Eric B.

788 | June 23, 2017 02:25:22 PM GMT

Thanks Eric for the detailed script and detailed information. We will get it checked from our end.

Comment by Vamseekrishna N.

789 | June 26, 2017 06:37:10 AM GMT

Hi Eric,

Please try the steps mentioned below. This should solve the heap space outofmemory error you are getting:
Goto [CF Home]\cfusion\hf-updates\hf-11-00012\backup\lib
Copy the files cf-acrobat.jar and xmpcoreold.jar
Goto [CF Home]\cfusion\lib
Take a backup of the files cf-acrobat.jar and xmpcore-6.0.6.jar
Delete these files from this folder
Copy the files you copied earlier in step 2 here
Restart CF
It should no longer throw Java Heap Space OutOfMemoryError

Please let us know if the workaround worked

Comment by Kailash B.

790 | June 30, 2017 05:06:31 AM GMT

Hi Kailash,

Thank you for this solution. I will try this later today and get back to you with my results. Is this a change that will be included in future CF hot fixes?

Thank you,

Eric

Comment by Eric B.

791 | June 30, 2017 01:32:39 PM GMT

Hi All,

Sorry for the late response. I finally had some time to test this today. I reinstalled HF12, and reset the web server connectors, deleted the collection, then recreated it from the current production version, and ran my original template. I believe that it ran to completion, though I'm not 100% sure. I ran the template as an HTTP request in Internet Explorer on the server itself, and when it stopped processing, it displayed the IE "This page can't be displayed" error page (not my custom error page). Normally when it stops processing, I get an error notification - this time I didn't - and the Collections page in the ColdFusion Administrator shows about 1,500 files indexed - after this run, it shows 15,585, which is nearly the number of files (16,621) it is attempting to index. I noticed several occurrences of the exception "Invalid bfchar pattern in ToUnicodeCmap" in the coldfusion-out.log file, including for the file that I attached here which was identified as one of the problem files. I'm not sure what that means, but, if there's something that I can tell the designers to do differently to prevent this, I'd love to know.

Should I deploy this fix in my production environment, or should I wait for the next hotfix?

Thank you,

Eric

Comment by Eric B.

792 | July 11, 2017 07:30:14 PM GMT

Hello Adobe Friends,

Could someone please advise if this is the permanent fix, and I if I should go ahead and deploy this in my production environment, or if I should wait for the next Hotfix release?

Thank you very much,

Eric

Comment by Eric B.

793 | July 24, 2017 02:03:36 PM GMT

tracker issue : CF-4198615

Java Heap Space OutOfMemoryError using CFINDEX after installation of Update 12