Hello, does anyone know of a trick/method to enable easy proofing of OCR text, possibly with ImageMagick or some other image manipulation tool. Adobe Acrobat X (mac) option to review OCR suspects one by one is slow.
This is my manual process to enable the proofreading (looking for a tool to automate)
Page by Page process
1 make pdf page double width (so I will be able to view the "text" and "image of text" side by side
(note: none of the ocr engines ive tested actually use featue of acrobat, "Layer" (e.g. layer text, layer image ), rather both text and image are in the same (one) layer)
2 using Adobe Acrobat X plugin (enfocus PitStop)
select page contents, change FILL to ON (the text is invisible/hidden/transparent as font fill is OFF)
(this gives the glassy 2 layer look, but not really layers)
3 unselect contents, then move the "image/bitmap" to right next to text
then I edit the OCR'd text as follows:
2 remove scannos (common scanning errors e.g. [ for J, 3 for S etc...
3 fix text because of my underlining in the book
NOTE: this method is for searchable image (EXACT) option selected in Adobe acrobat X (mac) OCR engine or ABBYY or READIRis
it does not work for AAX clearscan option, as clearscan is a different OCR method (it creates a new font, type 3, so editting is crazy crazy territory.. as fonts in PDF lands are not markup languages, they a blobs on a page)
4 then I either delete the image or (return to as-is e.g. step 3, 2)
5 return page size