Best (commercial) OCR application for Mac?

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Best (commercial) OCR application for Mac?

Postby lupocos » 28 May 2011, 07:16

Hello,

I’m testing several OCR applications for Mac (mostly non-free/opensource), and I’d like to know which one you prefer the most among the following ones:

- Acrobat Pro X
- Readiris Corporate 12
- Abbyy FineReader Express 10 (there’s no pro nor corporate edition for Mac yet)
- what else?

My aim is to produce searchable, reduced-size PDF from mostly black&white TIFF files (scanned via DIY book scanner).

Thank you in advance,

Cosimo
lupocos
 
Posts: 6
Joined: 25 May 2011, 12:42

Re: Best (commercial) OCR application for Mac?

Postby rob » 29 May 2011, 22:37

I'm on OSX, but I use VMWare to run a Windows instance, and then I use ABBYY FineReader 10.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
 
Posts: 770
Joined: 03 Jun 2009, 13:50
Location: Maryland, United States

Re: Best (commercial) OCR application for Mac?

Postby Misty » 30 May 2011, 10:19

Is FineReader for Mac not as good as the Windows version?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
 
Posts: 480
Joined: 06 Nov 2009, 12:20
Location: Frozen Wasteland

Re: Best (commercial) OCR application for Mac?

Postby rob » 30 May 2011, 11:33

One other thing -- take a look here to see some sample PDF output from ABBYY FineReader 9. Without a doubt, Adobe Acrobat will give you much better-looking output when you set it to OCR using ClearScan, since it effectively takes the image (cleaned up) and puts the searchable text underneath.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
 
Posts: 770
Joined: 03 Jun 2009, 13:50
Location: Maryland, United States

Re: Best (commercial) OCR application for Mac?

Postby jgreely » 01 Jun 2011, 12:02

Misty wrote:Is FineReader for Mac not as good as the Windows version?


Historically, no. When I originally looked at it, it was a full major release behind the Windows version, and didn't seem to be well-supported. Since I already had Windows running under VMware, I bought FineReader Pro 9 instead. There's a new "express" version in the Mac App store, which appears to be using their current engine, but I don't know for sure. It doesn't support Japanese, so I took advantage of last weekend's holiday sale to upgrade to Pro 10 for Windows instead.

-j
jgreely
 
Posts: 21
Joined: 03 Apr 2011, 17:09
Location: Silicon Valley, CA

Re: Best (commercial) OCR application for Mac?

Postby seasalt » 02 Jun 2011, 07:22

I have abbyy express application (mac) - and Abbyy pro 10 userguide (win)
significantly different in
- flexibility (express is all preset whereas pro 10 windows you can set your own options) (I have put abby reply to what these preset options are in next post)
- additional features (see full list below) key missing in express is MRC compression and split 2 pages
to workaround this abbyy express limitation
- MRC compression -- I use adobe pro X save as (reduce file size) feature
- split 2 page -- if images I use ST, if PDF I use PDFclerk to split

BOTH HAVE the key for pure OCR
the unchangeable settings (default) in abbyy express fir searchable PDF creation are good eg text under images
(I have put the impact of this preset option in next post -- reply from abbyy support )

from abbyy support:
Regarding the feature comparison, I’m afraid quite a lot of feature available in ABBYY FineReader 10 Professional Edition for Windows will not be found in ABBYY FineReader Express for Mac.
The Mac version is only and Express Edition that has somewhat limited functionality compared to Professional Edition.
We do plan to release a Professional Edition for Mac also, but it will not be released earlier than year 2012.
 
Below you can find the listing of features of ABBYY FineReader 10 Professional Edtition, that are not available in ABBYY FineReader Express for Mac:
 
Automatic detection of recognition languages.
Additional export formats including DOC/DOCX, XLSX, PPTX, CSV, etc.
Bar-code recognition.
Working with DjVu files.
Re-creation of documents formatting attributes: Headers & Footers, Page Numbering, Captions, Fonts & Styles, Footnotes, Table of contents.
Heading hierarchic structure retention.
Integration with Microsoft Office applications.
Effective support of multi-core processors.
Saving to password-protected or tagged PDF.
Different PDF saving modes (not only searchable PDF).
PDF/A support.
Better compression of PDF files (MRC compression) that allows you to receive smaller files without significant quality loss.
Automatic correction of Image resolution, 3D perspective, ISO noise, motion blur, uneven illumination and blurred images distortion.
Built-in image editor with trapezium crop tool, brightness and contrast slider and levels instrument.
Extended set of predefined "One-step" Quick Tasks.
WYSISWYG Text & Font Style Editor, ProofReading Tools, and Live Results Preview in Editor Window.
Bonus application ABBYY Screenshot Reader



----
the page stream created from abbyy v adobe (clearscan) is much cleaner if you wish to modify the PDF eg auto create table of contents. this is after testing 10 different books. I am still testing.

however the contrast & sharpness of text is noticeably better from adobe pro 10 (clearscan) than abbyy express (mac)

I am actually finding OCR at 300jpeg v tiff to be better
beats me - why.
seasalt
 
Posts: 45
Joined: 30 Apr 2011, 09:44

Re: Best (commercial) OCR application for Mac?

Postby seasalt » 02 Jun 2011, 08:25

abbyy express (mac) PRESET options in the available tasks
reply from abbyy support

3) The description of preset export options of ABBYY FineReader Express Reader is the following:
 
·         For RTF: editable copy (retains formatting and layout of the original document),
·         For PDF, as I have said: text under page image,
·         For HTML: flexible layout (retains formatting and layout of the original document.
·          
I have missed the settings for TXT and spreadsheets as they do not vary.
 
In case there is anything I have not mentioned that you would like to know, don’t hesitate to bring this to my attention.
 
4) The difference between the two PDF export options is the following:
 
If you select ‘Text under the Page Image’, you have both the original image and the text. Too bad but it’s not true, that Adobe Reader will extract the correct text even if the OCR was inaccurate. The Reader will extract exactly what the OCR software put in the text layer. The real benefit is that you can have the file looking the very same way it did before OCRing, but with a text layer for search and/or extracting. But, what you also get is a bigger size of the file, that has both text and image.  
 
As for the ‘Text and Images Only’, in this mode you get a significantly smaller file. Especially if there were no or few images in the original document. The layout and formatting of the original document will be retained as accurately as it is possible. But if you have a document with many images, complicated structure or fancy layout and background, it can look somewhat strange being saved in this mode.
 
The export mode cannot be selected universally for all files, it really depends very much on your requirements and the situation. But the general recommendation is: if the original document is mostly text with few pictures – then it’s ‘Text and Images Only’, otherwise it’s ‘Text under the Page Image’.
seasalt
 
Posts: 45
Joined: 30 Apr 2011, 09:44

Re: Best (commercial) OCR application for Mac?

Postby lupocos » 02 Jun 2011, 08:36

Does anybody know any commercial applications which creates a hOCR output? (i.e., “an open standard which defines a data format for representation of OCR output” - http://en.wikipedia.org/wiki/HOCR ).
Open-source apps like Cuneiform, Tesseract and OCRopus do support hOCR, but their OCR accuracy does not seem to be as good as Acrobat’s or Abby’s.
hOCR are simply html files which contain particular information as to where each text character is located in the scanned image. As you may know, the ruby app “pdfbeads” can join together the tiff images and the hOCR output in order to produce a searchable PDF (compressed with the jbig2enc for best quality/size)...
lupocos
 
Posts: 6
Joined: 25 May 2011, 12:42

Re: Best (commercial) OCR application for Mac?

Postby seasalt » 03 Jun 2011, 21:55

COMPARISON of OCR between ADOBE PRO X clearSCAN (mac) AND abbyy fine reader EXPRESS (mac)
my test:
clearscan and output 600dpi, file size 2.3mb
abbyy, not sure of output dpi as no option to set, file size 123mb
(orig doc was 10.6mb pdf of scan TIFF, 360 pages, no illustrations, bw)

found this link for comparison with screen images of sharpenness of result (it is authored by adobe!)
http://prsync.com/adobe/better-pdf-ocr- ... tter-4720/

(sorry link only - I do not know how to cut paste web page contents with images into this forum post box)
seasalt
 
Posts: 45
Joined: 30 Apr 2011, 09:44

Re: Best (commercial) OCR application for Mac?

Postby seasalt » 05 Jun 2011, 00:15

tested further
now I tested READIris and my view, the better than both adobe pro x (clearscan) and abbyyfinereader EXPRESS

as for MAC
1- change brightness and contrast in the ocr software
(abbyy windows can do this to)

2- output to many more options e.g. open office document - see screenshot 2
(still no output to csv as abbyy pro 10 windows can)
(abbyy EXPRESS is limited 4 output options - text, spreadshet, HMTL, searchable pdf (text under image)

3-output PDF different methods are selectable - so you can choose best to match your input (depending on quality of input) e.g. text, image and text, text and image, image
(abbyyEXPRESS it is predefined as text under image - great only for clean content)

4-zones created are more flexible - includes the normal text, image, but also includes graphic zone

5 - doesnt have split dual pages, spell checker, MRC compression as abbyy FineReader Pro 10 for Windows has

and READiris has a 50 page limit for conversion... where no limit in clearscan or FineReader Express

output options rEADiris.png
output options by READiris Pro12
output options rEADiris.png (33.34 KiB) Viewed 3827 times
seasalt
 
Posts: 45
Joined: 30 Apr 2011, 09:44

Next

Return to OCR/Optical Character Recognition

Who is online

Users browsing this forum: No registered users and 0 guests