General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
I was playing around with a few sample pages to get the hang of using Scan Tailor and Acrobat to go from original images to final PDF. When I did OCR of the pages using ClearScan, Acrobat did something pretty scary: Deleted chunks of text from the PDF. Attached are a sample PDF page before and after doing ClearScan OCR. As you can see there are pieces of missing text in the Sample2.pdf. Any idea how this could happen??
- Posts: 10
- Joined: 28 Mar 2012, 17:51
I'm not sure what the problem is as I downloaded both files, but reprocessed the first one. Initially I did a straight OCR in Acrobat X Pro and got good recognition, and then I went into the settings and changed to clearscan and got a reasonable recognition (it is all there) but encountered a problem I have had before of weird spacing issues.
This is the first OCR:
This is the second with clearscan turned on:
Personally I give clearscan a very wide berth as the output text just isn't up to scratch for what I am doing (I don't think it is up to scratch for anything really, as keyword searching is a joke when you have extra spaces thrown in randomly). What I don't get is your missing text.
What version of Acrobat are you using?
- Posts: 17
- Joined: 22 Dec 2011, 20:00
- Location: Nr. London, UK
Return to Programs, Software releases, and more.
Who is online
Users browsing this forum: No registered users and 3 guests