tracker issue | What iT iS dESign studios

Title:

Bug 83083:(Watson Migration Closure)Extracting text using <cfpdf action="extracttext"

| View in Tracker

Status/Resolution/Reason: Closed/Won't Fix/LowImpact

Reporter/Name(from Bugbase): Johan Steenkamp / Johan Steenkamp (Johan)

Created: 05/26/2010

Components: Document Management, PDF manipulation

Versions: 9.0

Failure Type: Unspecified

Found In Build/Fixed In Build: 9,0,0,251028 /

Priority/Frequency: Normal / Unknown

Locale/System: English / Win All

Vote Count: 0

Problem:

Extracting text using  <cfpdf action="extracttext" ....  type="xml" addquads="true" honourspaces="true" usestructure="true"/> does not always honour spaces. For example 123 456-789  may be extracted as 3 "words" 123456,  - (dash), and  789. Searching the same string in Acrobat Reader correctly identifies/requires space to match.
Method:

To reproduce depends on the pdf source document. If the problem is in the document then it is consistent. I can provide a source pdf if required.
Result:

None - incorrect process result

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3041545

External Customer Info:
External Company:  
External Customer Name: Johan Steenkamp
External Customer Email: 321062B0446EC5CD9920157F
External Test Config: 05/26/2010

Attachments:

Comments:

tracker issue : CF-3041545

Bug 83083:(Watson Migration Closure)Extracting text using <cfpdf action="extracttext"