OCR for odd font

Convert page images into searchable text. Talk about software, techniques, and new developments here.

OCR for odd font

Postby darewind » 25 May 2011, 19:49

I am doing some digital conversions of some of my old fantasy fiction novels and thanks to Scan Tailor it has been made much easier.However I ran into a problem with OCR I tried using ABBYY FIneReader 10 yet it still gives me to errors with the results. I was hoping someone could recommend a OCR to help me out.I have included a sample page and thanks in advance for any assistance you can provide.
Attachments
obsession to destroy_page0004_2R.tif
a sample of the page which I am using
darewind
 
Posts: 3
Joined: 25 May 2011, 18:43

Re: OCR for odd font

Postby daniel_reetz » 25 May 2011, 19:53

Welcome to the forums, Darewind! Nice looking scan - thank you for including it, they are diagnostic -

Can you tell us a bit more about the types of errors you are getting? Most people here OCR with some version of ABBY or the built-in OCR that Acrobat features. Odd fonts are indeed a problem, so if you could tell us more we might be able to help a bit more.
User avatar
daniel_reetz
 
Posts: 2490
Joined: 03 Jun 2009, 13:56

Re: OCR for odd font

Postby darewind » 25 May 2011, 21:28

The errors I am getting are just a failure to properly read and convert the words giving me misspelled words or gibberish.Mostly it has trouble reading/converting the letters d and h in words even with "training" the OCR I get misspelled words.
On a side note I am surprised I am able to get that level of a scan just pressing a book against a HP Deskjet F4240 all in one printer.Scan tailor really is a fantastic program.I am currently going over the designs here and am deciding which to use for my paperbacks as there are quite a few great designs here.

"Btam's banfc tremble*) as be held tbe burning torcb to tlx kindling beneath his father's body." is an error I am getting with the first sentence.
darewind
 
Posts: 3
Joined: 25 May 2011, 18:43

Re: OCR for odd font

Postby jgreely » 25 May 2011, 22:41

darewind wrote:The errors I am getting are just a failure to properly read and convert the words giving me misspelled words or gibberish.Mostly it has trouble reading/converting the letters d and h in words even with "training" the OCR I get misspelled words.


It doesn't like uncial-style fonts, does it? I loaded it into FineReader Pro 9, selected "train user pattern", unselected "use built-in patterns", and started training. Here's what I got:

Bram's hand trembled as he held the burning torch to the kin-dling beneath his father's body. Custom dictated that the son who would inherit the dead man's estate light the pyre. He licked the sweat from his upper lip when the fire from the oil-soaked straw leaped to the dry tim-ber. A gust of wind caught the small flame and sent sparks skyward with a loud crackle, forcing Bram back. With the back of his forearm, he shielded his eyes from the growing heat and the sight.
Cormac DiThon had died of a sudden heart attack six days before. A serving wench had found him slumped over his porridge. Now, the old cavalier's body, wrapped in his heavy military cloak, his shield laid across his barrel-shaped breast, was engulfed in yellow flames that jumped above a vivid sunset. Through the haze of smoke, Bram's mind escaped to the detached thought that the red sky boded well for the morrow.
When the old shaman began chanting, Bram snapped


I think unselecting the built-in patterns helped, since that got rid of a lot of standard shapes that didn't match things like the goofy "d" and the two-piece "r".

-j
jgreely
 
Posts: 21
Joined: 03 Apr 2011, 17:09
Location: Silicon Valley, CA

Re: OCR for odd font

Postby darewind » 26 May 2011, 00:04

Jgreely thank you very much that fixed my problems.I am glad it was user error on my part.
darewind
 
Posts: 3
Joined: 25 May 2011, 18:43


Return to OCR/Optical Character Recognition

Who is online

Users browsing this forum: No registered users and 0 guests