tracker issue : CF-3043982

select a category, or use search below
(searches all categories and all time range)
Title:

Bug 87057:(Watson Migration Closure)Summary: add some Solr config options to CFCOLLECTIONSee this blog entry of Ray Camden's: http://www

| View in Tracker

Status/Resolution/Reason: Closed/Won't Fix/

Reporter/Name(from Bugbase): Adam Cameron / Adam Cameron (Adam Cameron)

Created: 08/22/2011

Components: Text Search, Solr

Versions: 9.0

Failure Type: Unspecified

Found In Build/Fixed In Build: 0000 /

Priority/Frequency: Trivial / Unknown

Locale/System: English / Platforms All

Vote Count: 4

Problem:

Summary: add some Solr config options to CFCOLLECTIONSee this blog entry of Ray Camden's:  http://www.coldfusionjedi.com/index.cfm/2011/8/22/Indexing-PDFs-with-Solr-Read-this-tipI'll duplicate the key points here, in case the blog entry goes away:[quote]Have you noticed that indexed PDFs don't seem to contain all the content they should? Turns out this is a performance setting in Solr. The tip below is credit Uday Ogra of Adobe:Solr has a default upper limit of 10000 on max number of words which can be indexed in documents which approximately defaults to 20-40 pages.We can change this default value for each collection. Suppose collection's name is newcollection.Open file COLDFUSION_COLLECTIONS_PATH/newcollection/conf/solrconfig.xmlHere COLDFUSION_COLLECTIONS_PATH is the path you would have configured while creating the collection.Here search <mainindex> tag. Inside this tag there would be a sub-tag <maxFieldLength> which has a default value of 10000.You can change it to a value which will suit your indexing.(There is one more <maxFieldLength> tag directly under <indexDefaults> tag, do not change it)In your case I would recommend to change it to 100000.By the way on an average a single pdf page has around 200-500 words. So for a pdf with 100 pages setting this value to 100000 should be safe enough.[quote]My follow-up (which is the basis of this E/R):[quote]I also think it would be good to have this as an (optional) setting on CFCOLLECTION & the UI in CF Admin, rather than hacking config files. This *is* CF we're talking about after all![/quote]-- Adam
Method:


Result:

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3043982

External Customer Info:
External Company:  
External Customer Name: Adam Cameron
External Customer Email: 17EB1A7649DA54C7992015A9
External Test Config: 08/22/2011

Attachments:

Comments:

This bug has been voted..
Vote by External U.
20995 | November 11, 2011 07:30:55 AM GMT
I do think it is reasonable to add more settings at least for the common stuff into cfadmin for solr collections. I understand that solr is very configurable, but thhere are a lot of config files throughout the common application and all the tools it uses. Simplifying the common optimizations and behavioral changes makes it easier to focus on making an application work correctly and quickly rather than searching for the right file and changing a setting without full confidence in syntax.
Vote by External U.
20996 | November 11, 2011 07:30:56 AM GMT
This bug has been voted..
Vote by External U.
20997 | November 11, 2011 07:30:57 AM GMT
+1, if cfcollection is limited to ~40-50 pages of PDF content, then maxFieldLength should be a new cfcollection attribute.
Vote by External U.
20998 | October 20, 2012 02:50:23 AM GMT
In my vote, the "~40-50" should've been "~20-40".
Comment by External U.
20989 | October 20, 2012 02:51:28 AM GMT
These settings can be manipulated by directly editing the config.xml files. Since this settings are unlikely to be changed often, does not seem to be much of a value add in designing a UI to effect to the same. Details on the setting parameters can be referenced at https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Comment by Piyush K.
20990 | December 04, 2014 08:51:27 AM GMT
Hi Piyush, I just saw your comment here. Thanks for the follow-up. I still believe this would be a good setting to add. I Google'd "average word count per page" and the answer is basically "500 words for a single spaced page". So, <maxFieldLength>10000</maxFieldLength> only supports ~20 PDF single-spaced pages. I can imagine many PDFs being more than 20 pages. If the CF Admin UI cannot be updated, could the <maxFieldLength>10000</maxFieldLength> be increased to a higher default like 50000 or 1000000 (like suggested in the ticket description)? Thanks!, -Aaron
Comment by External U.
20991 | November 23, 2015 01:37:42 AM GMT
@Piyush this info, for how to work *your* software also needs to be in *your* docs too. Pointing to someone else's docs in a ticket in the bug tracker really isn't an adequate solution here. I can see no reference to this solution being linked to in your own docs at all. That said, I do agree that leaving it as an XML config option is probably adequate. Provided you document it properly. So as far as this ticket goes, please reopen it until suck time as the docs are at at least a minimum level of professionalism for an enterprise product, and *then* you can close it. Cheers.
Comment by External U.
20992 | November 23, 2015 03:09:25 AM GMT
+1 to Adam's comment. This should be documented. Thanks!, -Aaron
Comment by External U.
20993 | November 23, 2015 04:17:27 AM GMT
I'll guess that this was never documented in -CF's- docs. Em-I-Right? Thanks!, -Aaron
Comment by External U.
20994 | August 29, 2016 05:33:51 PM GMT