Adobe Acrobat deleting parts of page during OCR

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Adobe Acrobat deleting parts of page during OCR

Postby elhyam » 31 Mar 2012, 03:09

I was playing around with a few sample pages to get the hang of using Scan Tailor and Acrobat to go from original images to final PDF. When I did OCR of the pages using ClearScan, Acrobat did something pretty scary: Deleted chunks of text from the PDF. Attached are a sample PDF page before and after doing ClearScan OCR. As you can see there are pieces of missing text in the Sample2.pdf. Any idea how this could happen??

Sample1.pdf
(86.87 KiB) Downloaded 79 times

Sample2.pdf
(88.84 KiB) Downloaded 63 times
elhyam
 
Posts: 10
Joined: 28 Mar 2012, 17:51

Re: Adobe Acrobat deleting parts of page during OCR

Postby stearn » 10 Apr 2012, 15:24

I'm not sure what the problem is as I downloaded both files, but reprocessed the first one. Initially I did a straight OCR in Acrobat X Pro and got good recognition, and then I went into the settings and changed to clearscan and got a reasonable recognition (it is all there) but encountered a problem I have had before of weird spacing issues.

This is the first OCR:
Sample1a.pdf
Straight OCR
(92.61 KiB) Downloaded 44 times


This is the second with clearscan turned on:
Sample1b.pdf
ClearScan OCR
(60.18 KiB) Downloaded 46 times


Personally I give clearscan a very wide berth as the output text just isn't up to scratch for what I am doing (I don't think it is up to scratch for anything really, as keyword searching is a joke when you have extra spaces thrown in randomly). What I don't get is your missing text.

What version of Acrobat are you using?
stearn
 
Posts: 17
Joined: 22 Dec 2011, 20:00
Location: Nr. London, UK


Return to Programs, Software releases, and more.

Who is online

Users browsing this forum: No registered users and 3 guests