tracker issue : CF-3770905

select a category, or use search below
(searches all categories and all time range)
Title:

CF 10 and CF 11 - ColdFusion Collections unable to index pdfs consistently

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/CannotReproduce

Reporter/Name(from Bugbase): / Ling Lin (Anjaneai Srivastava)

Created: 06/04/2014

Components: Text Search, Solr

Versions: 11.0

Failure Type:

Found In Build/Fixed In Build: 11,0,0,289883 /

Priority/Frequency: Normal / Some users will encounter

Locale/System: English / Win 2008 Server R2,Windows 7 64-bit

Vote Count: 0

Problem: If you create ColdFusion Collections in CF 10 or CF 11 admin and index them for .pdf files located anywhere, you will get inconsistent results for the same set of pdfs on different machines. 

Method: 

1) Login to ColdFusion 10/11 admin console
2) Create ColdFusion Collections.
3) Index ColdFusion Collection with - File Extensions as - ".pdf" and give the Directory Path for any pdf files (you can use the path for the attached PDFs)
4) Select Submit. And Check the Solr Collections List for Documents and Size Columns.
5) Repeat this process on different Windows platform.

Result: The number of documents and Size varies on different machines with same set of files. If you remove or add more PDFs to that directory and re-index the same directory path the columns continue to show inconsistent/incorrect number. 

Expected: The Size and Documents should show the exact number of PDFs present in the directory and the size should be accurate. If anyone adds or removes files from the directory, it should continue to have correct data.

Workaround:
None.

Additional information :

1) The problem occurs on Both CF 10 and CF 11 Collections with the same folder of PDF.
2) At times it takes some time to reflect the correct value. Mostly, it numbers do not match the actual file count or size. The problem is more evident if the size of PDF is large.
3) The problem has two level of incorrect indexing: 
   a) When you create collections with PDF files. 
   b) If you re-index the PDF files in an existing collection. If you remove four file from a folder containing 5 files and add 3 files to it. The count should be 4, but it shows as 8. If it has picked a figure, it continues to      add files to the existing file count.

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3770905

External Customer Info:
External Company: ECENTRICARTS
External Customer Name: Ling Lin
External Customer Email: llin@ecentricarts.com

Attachments:

Comments:

tried this on 2 different platforms: Win7x64 and Win 2013 R2. size of the collections is the same in both the cases (1,595kb) with the 4 PDFs used. test case: <cfset col_name = "PDF_coln"> <cfcollection action="list" name="lst_col" engine="solr"> <cfset col_lst = ValueList(lst_col.NAME, ",")> <cfif ListContainsNoCase(col_lst, "#col_name#") EQ 0> <cfcollection action = "create" collection = "#col_name#" engine = "solr" path = "#expandpath(".")#\Col"> </cfif> <cfset filename = "#Expandpath(".")#\srcPDFs"> <cfindex action = 'update' collection = '#col_name#' type = 'path' key = '#filename#' extensions = ".pdf" urlpath = "#CGI.http_host#/solr-indx-pdfs/"> <cfset sleep(4000)> <cfsearch name="srch_test" collection="#col_name#" criteria= "IBM"> search result... <cfdump var="#srch_test#"> done...
Comment by Piyush K.
12009 | August 12, 2014 09:41:48 AM GMT
re-indexing the same collection the size changes from 1,595kb to 3,186kb, even though the source files are unchanged.
Comment by Piyush K.
12010 | August 12, 2014 11:39:12 PM GMT
@piyush Which action did you use? Refresh or update....... Update will add to the existing collection....As you can see size has exactly doubled so you must have updated the collection
Comment by Uday O.
12011 | August 13, 2014 12:25:21 AM GMT