Title:
Bug 83599:I'm not sure if this is a regression bug with more log info, or a new one that looks like an old one
| View in TrackerStatus/Resolution/Reason: Closed/Fixed/
Reporter/Name(from Bugbase): Tomas Fjetland / Tomas Fjetland (TomasFjetland)
Created: 07/15/2010
Components: Text Search, Solr
Versions: 9.0.1
Failure Type: Unspecified
Found In Build/Fixed In Build: 274733 / 275414
Priority/Frequency: Major / Unknown
Locale/System: English / Win All
Vote Count: 0
Problem:
I'm not sure if this is a regression bug with more log info, or a new one that looks like an old one. I'm indexing a collection of 52 000 text files of less than 10KB each. After installing the 9.0.1 update, it never finishes indexing the collection, but dies with the error quoted. The file referred to is a simple indexing job that is called from Scheduled Tasks. It does submit about 23 000 files into the collection before dying.
Method:
Set up a collection "Fileinfo".run an index job against a unc path like<cfindex collection="Fileinfo" action="refresh" extensions=".nfo" key="\\nas\researchdata\metadata" type="path" urlpath="\\nas\researchdata\metadata" recurse="yes" language="english" status="metastat">where there are 50 000+ small text files with metadata using a .nfo extensionIt stops halfway with the error listed. Along the way it might also log errors for individual files in the server.log. the logging is new, but it might not have been able to index them before either:"Warning","jrpp-26","07/15/10","16:39:53",,"WARNING: Could not index \\nas\researchdata\metadata\project.591\early.drafts\index.overview.nfo in SOLR. Check the exception for more details: Text encoding could not be detected and no encoding hint is available in document metadata"
Result:
"Error","jrpp-26","07/15/10","16:39:53",,"org/ccil/cowan/tagsoup/Parser The specific sequence of files included or processed is: C:\Inetpub\wwwroot\Verity\indexer_weekly.cfm, line: 28 "java.lang.NoClassDefFoundError: org/ccil/cowan/tagsoup/Parser
----------------------------- Additional Watson Details -----------------------------
Watson Bug ID: 3041780
External Customer Info:
External Company:
External Customer Name: Tomas Fjetland
External Customer Email: 599872CD4866EA489920154A
External Test Config: 07/15/2010
Attachments:
Comments: