tracker issue : CF-4204270

select a category, or use search below
(searches all categories and all time range)
Title:

cfspreadsheet crashes/locked up server with (238kb) file

| View in Tracker

Status/Resolution/Reason: To Fix//EnhancementRequired

Reporter/Name(from Bugbase): Chad S. / ()

Created: 04/25/2019

Components: Document Management, Office Integration

Versions: 2016,2018

Failure Type: Crash

Found In Build/Fixed In Build: 2018,0,02,313961 /

Priority/Frequency: Normal / All users will encounter

Locale/System: English / Win 2012 Server x64

Vote Count: 0

Problem Description:
When trying to read certain excel documents we've run into an issue where it will crash our server. There are no errors, but Coldfusion continues to take up CPU and memory until it reaches it's limits, or the servers limits. 

You will notice that this spreadsheet has almost a million rows in it, except there is NO data. Something about how cfspreadsheet is reading this document is bugged as it's attempting to add blank rows. 

The problem is... we can't even read the file to tell if there is a problem prior to "fully reading" the file. 

Steps to Reproduce:
Use cfspreadsheet to try an open it. 

Actual Result:
Crashes or Locks up server

Expected Result:
Reads the file

Any Workarounds:
None at this time.

Attachments:

Comments:

Chad, The excel file you've shared has a hidden sheet called "page". Col "I" in that sheet has formulae that follow pattern similar to the following: =INDEX(C:C,MATCH(G2,A:A)) Here the address space C:C and A:A is quite exhaustive. CF internally uses POI library for spreadsheet manipulation, and it's running into memory outage when reading cell with such formulae. You can avoid the issue by using a more restricted search space such as A1:A1500. Let us know if this works to solve the issue.
Comment by Piyush K.
30710 | May 10, 2019 11:12:20 AM GMT
No, it doesn’t. There is no documented way to restrict search like that with cfspreadsheet. Suggestions: Add that capability to cfspreadsheet tag OR Create a new method that pulls high level information about a spreadsheet. Items like Total sheets, Total hidden sheets, total records per sheet. That information would allow any developer to quickly scan documents prior to a full read.
Comment by Chad S.
30745 | May 15, 2019 09:52:37 PM GMT
Piyush, first let me thank you for looking at this issue. I've had some more thoughts but let me state my goals here as your team might find alternative solutions to meet those goals When using cfspreadsheet: 1) Make sure files like these can't crash ColdFusion servers (or seriously load them down) 2) Add the capability to identify a host of format/template/etc issues prior to reading the entire document to memory. After looking at POI documentation the following should be possible and would be AMAZING to developers. It would make a newsworthy impact for ColdFusion developers with relatively little effort. Here's the link I used: https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/Workbook.html Item #1 Create New Method: ReadTopLevel (or whatever you want to call it) This creates a Struct/Array with the following information: (Top Level) - Total Sheets/Tabs [ getNumberOfSheets() ] - Total Hidden Sheets/Tabs [ isSheetHidden() AND isSheetVeryHidden() ] - Total Hidden Columns in Sheets/Tabs (Top Level Sheet Data) Sheet/Tab - Name [ getSheetName() ] - Visibility [ getSheetVisibility() ] - Total Rows [ getPhysicalNumberOfRows() ] - Total Columns [ read first row only, getRow(0).getPhysicalNumberOfCells() ] - Column Names [ cell.getSheet().getRow(0).getCell(currentcellIndex) .getRichStringCellValue().toString() ] Item #2 Add functionality to cfspreadsheet would allow us to: - Read from only visible Sheets/Tabs - Read from only the first Sheet/Tab - Read from only Sheet/Tab with name "X" - Read only the top X rows (developers understand java starts with 0). Thank you so much for working with this issue.
Comment by Chad S.
30752 | May 16, 2019 05:34:31 PM GMT
Chad, First of all, let me apologize for the delay in getting back to you on this. Now, most of the functionality that you've requested already is in existence. For example. spreadsheetInfo function returns information such as no. of sheets and their names. You can find the exhaustive list of spreadsheet related CF functions at https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-s/spreadsheetinfo.html You can also selectively read data for a specific sheet by specifying the sheetname or sheet (number) attribute. You may refer the docs on spreadSheetRead for details. For any information related to the data in the spreadSheet, you can read the data into a query first and do the required processing for the information you need. For example, for "getPhysicalNumberOfRows", you may read the data into a query and get the recordCount Now in cases where a large amount of data in a file that can lead to memory overruns, one cannot pull information such as no. of records in a sheet without loading the data first. That there is a catch 22 kind of a situation. The only functionality that I see missing is to identify hidden sheets. SpreadsheetInfo returns amongst other things a list of all sheets whether they're hidden or not.
Comment by Piyush K.
33361 | April 02, 2020 08:42:50 PM GMT
Changing this to an enhancement. Looks like we can use some new function to set the visibility of sheets, and also return the visibility status of sheets. That info can be appended to the struct returned by spreadsheetInfo as well.
Comment by Piyush K.
33362 | April 03, 2020 07:04:31 AM GMT