recaptcha wrote:I thought a PDF was a kind of text file.
could I make an e-pub file straight from ScanTailor
recaptcha wrote:Thanks. Does it make any difference to OCR accuracy whether I do the PDF before or after? Is it any easier to correct OCR mistakes before or after PDF ?
Yes, I'd like a fully searchable document.
I read somewhere that you can scan the page image right into AbbyyFineReader, thus eliminating at least one importing step.
recaptcha wrote:...I'm on a Mac with a bootcamp partition, so I guess I could go either way (Windows or Mac). I'm willing to use whatever software makes things faster and easier e.g. Abbyy FineReader, OmniPage, or any of the open source ones mentioned around here. (I read somewhere that you can scan the page image right into AbbyyFineReader, thus eliminating at least one importing step).
strider1551 wrote:...To preserve the page layout, the fonts, etc., most people here cleanup their images with scantailor, compress the image, and put it directly into the pdf - think of the pdf as a container format for a bunch of pictures. While that sounds like a whole book would be a huge file, black and white images compress very well - I get about 15.5 kB per page at ~365 dpi....OCR text is icing on the cake. If you also provide the pdf with OCR information you can search the pdf and it will bring you to the page the word is on, and possibly even highlight where the word is in the image. Typically it is preferred to do the OCR step before compressing the image and putting it in the pdf, since the most efficient compression methods tend to be "lossy", but I don't know if anyone has done a study on whether this has any significant impact on OCR quality.
Users browsing this forum: No registered users and 0 guests