Title:
Bug 83083:(Watson Migration Closure)Extracting text using <cfpdf action="extracttext"
| View in TrackerStatus/Resolution/Reason: Closed/Won't Fix/LowImpact
Reporter/Name(from Bugbase): Johan Steenkamp / Johan Steenkamp (Johan)
Created: 05/26/2010
Components: Document Management, PDF manipulation
Versions: 9.0
Failure Type: Unspecified
Found In Build/Fixed In Build: 9,0,0,251028 /
Priority/Frequency: Normal / Unknown
Locale/System: English / Win All
Vote Count: 0
Problem:
Extracting text using <cfpdf action="extracttext" .... type="xml" addquads="true" honourspaces="true" usestructure="true"/> does not always honour spaces. For example 123 456-789 may be extracted as 3 "words" 123456, - (dash), and 789. Searching the same string in Acrobat Reader correctly identifies/requires space to match.
Method:
To reproduce depends on the pdf source document. If the problem is in the document then it is consistent. I can provide a source pdf if required.
Result:
None - incorrect process result
----------------------------- Additional Watson Details -----------------------------
Watson Bug ID: 3041545
External Customer Info:
External Company:
External Customer Name: Johan Steenkamp
External Customer Email: 321062B0446EC5CD9920157F
External Test Config: 05/26/2010
Attachments:
Comments: