Title:
Bug 87057:(Watson Migration Closure)Summary: add some Solr config options to CFCOLLECTIONSee this blog entry of Ray Camden's: http://www
| View in TrackerStatus/Resolution/Reason: Closed/Won't Fix/
Reporter/Name(from Bugbase): Adam Cameron / Adam Cameron (Adam Cameron)
Created: 08/22/2011
Components: Text Search, Solr
Versions: 9.0
Failure Type: Unspecified
Found In Build/Fixed In Build: 0000 /
Priority/Frequency: Trivial / Unknown
Locale/System: English / Platforms All
Vote Count: 4
Problem:
Summary: add some Solr config options to CFCOLLECTIONSee this blog entry of Ray Camden's: http://www.coldfusionjedi.com/index.cfm/2011/8/22/Indexing-PDFs-with-Solr-Read-this-tipI'll duplicate the key points here, in case the blog entry goes away:[quote]Have you noticed that indexed PDFs don't seem to contain all the content they should? Turns out this is a performance setting in Solr. The tip below is credit Uday Ogra of Adobe:Solr has a default upper limit of 10000 on max number of words which can be indexed in documents which approximately defaults to 20-40 pages.We can change this default value for each collection. Suppose collection's name is newcollection.Open file COLDFUSION_COLLECTIONS_PATH/newcollection/conf/solrconfig.xmlHere COLDFUSION_COLLECTIONS_PATH is the path you would have configured while creating the collection.Here search <mainindex> tag. Inside this tag there would be a sub-tag <maxFieldLength> which has a default value of 10000.You can change it to a value which will suit your indexing.(There is one more <maxFieldLength> tag directly under <indexDefaults> tag, do not change it)In your case I would recommend to change it to 100000.By the way on an average a single pdf page has around 200-500 words. So for a pdf with 100 pages setting this value to 100000 should be safe enough.[quote]My follow-up (which is the basis of this E/R):[quote]I also think it would be good to have this as an (optional) setting on CFCOLLECTION & the UI in CF Admin, rather than hacking config files. This *is* CF we're talking about after all![/quote]-- Adam
Method:
Result:
----------------------------- Additional Watson Details -----------------------------
Watson Bug ID: 3043982
External Customer Info:
External Company:
External Customer Name: Adam Cameron
External Customer Email: 17EB1A7649DA54C7992015A9
External Test Config: 08/22/2011
Attachments:
Comments: