Advanced - split text for OCR, but keep "original" image

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
nightshift
Posts: 24
Joined: 09 Feb 2024, 22:21
E-book readers owned: Nook Glowlight 4, Kindle Fire 5th gen
Number of books owned: 100
Country: USA

Advanced - split text for OCR, but keep "original" image

Post by nightshift »

New problem

Pictorial pages with text - I want to split out the text, to run through OCR (so that the final pdf is searchable and highlightable/copyable) but, doing so right now yields and image like:
0006-a-ick.jpg
I can't seem to keep the background around the text (plus, the original text is a grey color, which looks a lot better on the background). Unfortunately, I can't find an option that lets me get the text split and the mask, while keeping the original actual image to go in as the main layer on that page for the pdf. I suppose I COULD process everything but the splits, move that output folder, save as the project and run the splits, then replace the non-split versions before "binding" to pdf, but, is there a better way?
nightshift
Posts: 24
Joined: 09 Feb 2024, 22:21
E-book readers owned: Nook Glowlight 4, Kindle Fire 5th gen
Number of books owned: 100
Country: USA

Re: Advanced - split text for OCR, but keep "original" image

Post by nightshift »

Ok, did a test, as long as I don't try and split these pages for better compression, everything works out fine.
cday
Posts: 456
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Advanced - split text for OCR, but keep "original" image

Post by cday »

nightshift wrote: 26 Mar 2024, 14:44 Ok, did a test, as long as I don't try and split these pages for better compression, everything works out fine.
Premature, perhaps, to offer a suggestion on that, but I recently discovered a cross-platform software new to me that did an excellent job of making images searchable, NAPS2.

I know that there are other well-known OCR softwares commonly used with Scan Tailor, and that you tend to prefer command line tools, but if you have some time to spare you might try running a few test pages through it, the interface allows existing image files to be loaded. No difficult bending required!

A PDF file created from a few pages from my own early (flatbed) book scanning experiments attached.

Seems a useful tool for easily making scanned images searchable... :D

naps2_Scott_test.pdf
(585.29 KiB) Downloaded 14 times
Post Reply