Finereader - Delete everything outside of recognized text area?

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Post Reply
glenleslie
Posts: 30
Joined: 13 Aug 2012, 09:08
E-book readers owned: Kindle - multiple platforms
Number of books owned: 1000
Country: United States

Finereader - Delete everything outside of recognized text area?

Post by glenleslie »

Greetings,

Anyone know if there's a way to set finereader so that it removes everything outside of the recognized text areas?

Once the text is recognized and is confirmed, I would like it to delete all the noise outside the recognized area (right hand side, junk under the largest section in the header). Output type is PDF, text under image.

I would really like it to just put the recognized sections on a pure white background... not preserve all the noise. Thoughts?

Image
BruceG
Posts: 99
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Finereader - Delete everything outside of recognized text area?

Post by BruceG »

As you mention there are two layers, image and text, any editing you do is in the text layer. ie has no effect on the image layer. What you want is to remove the image layer and only have the text layer. I do not believe Finereader will do this while saving to pdf, where the options are text either under or over the image. Saving to non pdf formats will however only save text, tables and pictures as in the text editing window. Any picture with background/paper colour surrounding the image will also be copied across. This is what people do to create ebooks or who prefer to edit in Word.
You could play around with the Image Editor before you Recognize Page. I only use 'deskew, straighten lines, split and crop.
So you could - select what you want in Finereader Auto/manual - Edit/Verify - save to word - save to pdf. Remember the formatting to word may not be 100%.
Trust this helps
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Finereader - Delete everything outside of recognized text area?

Post by cday »

glenleslie wrote: 30 Sep 2022, 21:27 Anyone know if there's a way to set finereader so that it removes everything outside of the recognized text areas?
BruceG wrote: 30 Sep 2022, 23:50 As you mention there are two layers, image and text, any editing you do is in the text layer. ie has no effect on the image layer. What you want is to remove the image layer and only have the text layer.
It is a long time since I used FineReader, and my versions are not current. May I suggest that you upload a sample source image page that anyone who likes a challenge can use to try to find a solution? If no way can be found to save a clean text directly to searchable PDF, possibly the image preprocessing options might provide one, or possibly someone can suggest preprocessing in another software.
glenleslie
Posts: 30
Joined: 13 Aug 2012, 09:08
E-book readers owned: Kindle - multiple platforms
Number of books owned: 1000
Country: United States

Re: Finereader - Delete everything outside of recognized text area?

Post by glenleslie »

BruceG wrote: 30 Sep 2022, 23:50 As you mention there are two layers, image and text, any editing you do is in the text layer. ie has no effect on the image layer. What you want is to remove the image layer and only have the text layer. I do not believe Finereader will do this while saving to pdf, where the options are text either under or over the image. Saving to non pdf formats will however only save text, tables and pictures as in the text editing window. Any picture with background/paper colour surrounding the image will also be copied across. This is what people do to create ebooks or who prefer to edit in Word.
You could play around with the Image Editor before you Recognize Page. I only use 'deskew, straighten lines, split and crop.
So you could - select what you want in Finereader Auto/manual - Edit/Verify - save to word - save to pdf. Remember the formatting to word may not be 100%.
Trust this helps
Ah! Your post reminded me of this post below regarding "despeckling" (a Scantailor feature) in Finereader. In FR, with B&W an automated tool doesn't exist! This fellow's advice points out my rookie mistake -- I set my scanner for B&W scans to reduce file size and then used B&W in Finereader to reduce recognition time ... because this worked really great on a few sample pages which were much cleaner than the sample I posted. But when I got to a section of 50 pages or so with all this discoloration around the text, things got quite a bit uglier. Scanning to Grayscale and leaving the Finereader recognition setting to color would allow me to use all the Finereader photoediting options which are *not* available in B&W mode! I had never realized this is "why" Grayscale. Without it, the OCR engine is the only real cleanup tool -- which should still be possible since it is recognizing the text and FR allows one to alter the recognized area with great detail. It just seems that FR should be able to apply the text recognition back to the underlying photo -- but such is not the concept in the OCR world I guess. Images are images and text is text... never the Twain should mash them up (a Twain wreck!) :twisted:

As CDAY pointed out, the only option is to go into the Image Editor and crowbar all the noise around the text. The only tool you have at your disposal there in B&W mode is the "delete" tool (a crowbar when you need an archaeologist's brush/toothbrush)--- which only draws square boxes so it is quite difficult to delete noise buried in a paragraph of text.

Such is life. Crowbar it is (it's easier than re-scanning due to my current setup). :geek:
b0bcat wrote: 06 Jun 2017, 17:42 The specimen tif supplied is b&w 2-colors and although 300dpi, I noticed is either insufficiently defined and/or too over-exposed to capture small typeface print.

If possible, the quickest route would seem to be re-scanning in greyscale at a suitable dpi, maybe say, 400, and adjusting the scanner optimally as suggested by others above, including as to brightness.
-- Full thread is here posting.php?mode=quote&f=19&p=20705
Post Reply