Best free OCR SW

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Post Reply
BillGill
Posts: 139
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Best free OCR SW

Post by BillGill »

I am looking for a free OCR SW package that is relatively simple to use. I am planning to go to the Tulsa Maker Faire in August and show my scanning set up. On my main computer I use AABBYY Finereader 14, which works quite nicely. But when I go to the Maker Faire I take my Lap Top. I don't want to put an expensive SW package on it to use for just a short period. And of course Finereader is now a subscription product. I don't want to keep buying my software every year, when I don't need any new functions.

So I need a free package that will take my images and convert them into text files. I don't want PDF files, just text.

From there of course it will be easy (well, not impossible) to edit the files to make any needed corrections in a freeware text editor, such as Libre Office. I just need an OCR program that can take all the images and process them for me (and for any body who visits my booth.)

Bill
cday
Posts: 442
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Best free OCR SW

Post by cday »

Have you searched online for something like 'freeware ocr software'? There is some, although I have no personal experience. Some freeware software uses the open source Tesserat ocr engine which had good reviews, although it has been around for a while now. If you only need to ocr simple pages, there should be satisfactory solutions available, but possibly not for more complex pages, for example containing tables.

Less obvious solutions if you consider any of them practical:

Cloud-based solutions are now becoming common and should be reasonable for at least simple pages, but would of course require good internet access and might possibly be rather slow for a quick demo.

The *very inexpensive* Abbyy Screenshot Reader is powerful software but would only capture the text on one page at a time, and probably only the unformatted text. Excellent tool, though, when that is needed.

The free reduced-functionality versions of the mainstream ocr softwares commonly bundled with scanners are good in my limited experience, I have a copy of one of those on my laptop, although I don't think I have actually ever used it... ;)

You might also check the licence for your existing FineReader software: some licences at least in the past allowed two copies of the product to be installed as long as they weren't used at the same time, envisaged originally I think for a user who might wish to install their copy on both a desktop and a laptop.
TDavLinguist
Posts: 5
Joined: 10 Jan 2024, 11:21
E-book readers owned: Kindle Fire, Galaxy Tab
Number of books owned: 800
Country: Thailand
Contact:

Re: Best free OCR SW

Post by TDavLinguist »

I realize this is an old post, but I'm going to provide an answer anyway, in case someone stumbles upon this thread and needs some advice.

The ABBYY FineReader was mentioned by cday, which is of course not free either in terms of free beer or in terms of free (and open source) software. A good place to start is www.alternativeto.net, which provides some results for OCR software. Linux is my primary operating system and I've always followed this tutorial on the forums for creating a Linux FOSS workflow. If you'd like something that is less involved than pdf.py + tesseract-ocr + whatever else the tutorial suggests, I recommend trying out NAPS2. It's available for Windows, Mac, and Linux, it's FOSS, and it's fairly simple to use. You can either import something directly from a SANE scanner and process it there, or import a series of images, including PDFs, and run OCR right in the program. One downside to me regarding NAPS2's OCR is that it only allows for recognition of one language, whereas tesseract-ocr can OCR multiple languages at a time.

Another command-line-based program to look at is OCRmypdf. It can be run as simply as

Code: Select all

~$ ocrmypdf INPUT.pdf OUTPUT.pdf
Where English is the default OCR language, but that can be adjusted, assuming you have other tesseract-ocr language packs installed.

Code: Select all

~$ ocrmypdf -l eng+deu INPUT.pdf OUTPUT.pdf
I've used all three methods above and I would argue that
  • ocrmypdf is easiest in terms of making your computer do the bulk of the work
  • NAPS2 is easiest if you are not comfortable with the command line
  • The tutorial I mentioned is the best in terms of optimization and manual control
At the end of the day, it's all your preference.
Post Reply