Preserve color but make white background? (no OCR)

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

Doranwen
Posts: 64
Joined: 13 Aug 2014, 04:19
E-book readers owned: Onyx Boox (dead), Astak EZ Reader (dead)
Number of books owned: 2700
Country: USA
Contact:

Preserve color but make white background? (no OCR)

Post by Doranwen »

I'm scanning some workbooks in to use with some of my students, who often need to repeat pages (and I can't bear to destroy the book and put loose pages in sheet protectors). These workbooks are not the sort of thing one wants to turn into a standard ebook with OCR, so I'm just converting the full-page scans to pdf files once I've cropped them, and stitch them into a single pdf when I'm done.

The process looks good and I can print them… but the background of the page is not actually proper white, so when printing grayscale as I often do, the background turns into a gray color, sometimes so dark it's almost impractical for them to write on it with a pencil. (It's also wasting a lot of toner in the printing.) I've dealt with it some by printing *one* page and then making copies of that using a lightening setting on the copier, but it lightens everything equally, of course, meaning that the contrast between the graphics the students need to look at and the background they need to be able to write on still is not that great.

Is there a way to preserve the darkness (and indeed, color - even if I'm not always printing it) of the graphics and printed text while still lightening up the background enough so that it's not printing with so much gray? (Ideally the whole background would get rendered as white and not have gray in it, but that may be beyond technological capabilities without excessive work.) Splitting up the page into graphics vs. text is pretty much impossible, as you'll see from the examples of music workbooks I was doing this with, which I attached to this post. The one with the colored background might be impossible to lighten up further, but the one that's entirely black & white is more typical of what I'm working with - it just prints too dark background, but adjusting the contrast with GIMP (the picture editing tool I have at my fingertips) just eliminates the subtler shading and distorts the pictures if I move the slider enough towards contrast to actually eliminate the color in the background itself. (And as it currently stands, one can see bits of the picture on the other side of the page.)

I tried to poke at ScanTailor using the Win7 box I have the scanning phones connected to but I found that it was weird and I couldn't figure out how to use it for what I needed. If it's got the solution to this, I would welcome instructions.

My current process:
I use a set of bash scripts on my Linux box to crop the left and right pages each (once in a while i have to do a manual crop to make up for an error in positioning but generally it works), then I use pyrename to automatically rename all of them to A and B versions in the proper order, and then I can merge them into a single folder, convert to pdf, and combine all the pdfs into a single file.
Attachments
sample page scanned 2.jpg
sample page scanned.jpg
aku
Posts: 54
Joined: 02 Jan 2010, 08:38
Number of books owned: 0
Country: Germany
Location: Willich, Germany

Re: Preserve color but make white background? (no OCR)

Post by aku »

I can only think of noteshrink, at https://mzucker.github.io/2016/09/20/noteshrink.html
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

At a very quick look that noteshrink tool looks a powerful tool... :D

My general approach to this kind of problem, that occurs often when scanning unless the the scanner has a facility to preview scans and make use of the adjustments normally available in scanner interfaces, although perhaps less so in Linux SANE interfaces in my brief experience, is to use a Levels adjustment for a quick fix.

sample page scanned_Levels.jpg
sample page scanned_Levels.jpg (84.21 KiB) Viewed 3363 times

sample page scanned 2_Levels.jpg
sample page scanned 2_Levels.jpg (89.71 KiB) Viewed 3363 times

Whether those quickly produced sample results are acceptable may depend on whether the background colour present in the sample scans provided is present in the originals, and if so whether it is is desired to retain it. The lower image looks as if it might also benefit from a little sharpening.

I normally use XnView MP, which is a Qt cross-platform software and ideal for my needs as primarily a Linux user now... :D

Note: XnView MP supports batch processing if required, or the NConvert command line software is also available in a Linux version, or you might prefer a better known tool such as ImageMagick.
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

In the case of images output from a camera, rather than images from a flatbed scanner, the evenness of illumination is probably the factor that most effects the quality of the enhanced images that can be obtained: the effect can be seen particularly in the music image above, where a variation in the background colour across the image can be seen.

I suspect that brighter illumination may also in principle be helpful, could someone with direct experience comment on that?

Thinking some more, since a levels adjustment can be applied to a rectangular selection on an image rather than necessarily to the whole image, a fairly simple script could probably be devised that applies different correction intensities to different zones across the image. And since the variation in illumination across images would be a constant for a given scanner configuration, the values required could probably be determined once, and could then be used for many future scans.

You could possibly play with that idea by imaging a plain white sheet, and then experimenting with different numbers of zones and levels correction intensities. Maybe dpc has some thoughts on that or relevant previous experience?
Doranwen
Posts: 64
Joined: 13 Aug 2014, 04:19
E-book readers owned: Onyx Boox (dead), Astak EZ Reader (dead)
Number of books owned: 2700
Country: USA
Contact:

Re: Preserve color but make white background? (no OCR)

Post by Doranwen »

I tested noteshrink (well, I couldn't do the pdf convert, it kept failing on that part, but I convert to pdf with convert anyway, so I was simply interested in seeing what png it produced). It didn't do too bad at making it reasonable for printing, but it got rid of all color completely (made even the colored background on the one set of notes turn to white). I suspect there's some settings (maybe in setup.py?) that I'd have to set, but the write-up was a bit over my head (and the instructions rather sparse and unhelpful for anything beyond the default) and I gave up on that.

I took the second pic (the one with the colored background on the notes) and tried adjusting levels. Manually, it was hard to figure out the right balance - too light and the whole page looked washed out, too dark and the non-white background would be a serious problem when printing. I did try the Auto button, for automatically adjusting levels, and it seemed to produce a reasonably good looking image for a start. I then had to do several manual adjustments to get something I would consider printing out (see the pic below).
sample page scanned (levels adjusted).jpg
While this works for a single pic, for a workbook of even 50 of these such pages, it is prohibitively time-consuming. I'll look into XnView MP as it looks like its batch processing may be the ticket here.

EDIT: XnView MP is *wonderful*. I batch-converted a whole set and fixed the levels for the entire workbook at once. Thank you for the suggestion!
Last edited by Doranwen on 13 Jun 2022, 03:23, edited 1 time in total.
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

@Doranwen:

I edited my previous post while you were posting, so it might be worth reading again.
Doranwen
Posts: 64
Joined: 13 Aug 2014, 04:19
E-book readers owned: Onyx Boox (dead), Astak EZ Reader (dead)
Number of books owned: 2700
Country: USA
Contact:

Re: Preserve color but make white background? (no OCR)

Post by Doranwen »

@cday:

Even that level of experimentation is beyond me, I'm afraid. :/

Fortunately, most of the workbooks I'm scanning in aren't color-heavy, and are more like the other pic - b&w illustrations with text.
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

Doranwen wrote: 13 Jun 2022, 03:25 Fortunately, most of the workbooks I'm scanning in aren't color-heavy, and are more like the other pic - b&w illustrations with text.
For images that are or can be regarded as being grayscale or black and white, in my experience it in normally best to convert the image from colour to grayscale before applying a levels adjustment.
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Preserve color but make white background? (no OCR)

Post by cday »

Revisiting your problem, if you log in again, I have two further suggestions.

First, I am wondering if testing different camera settings might enable you to get brighter initial images which might display more even illumination.As I don't have a camera or currently a smartphone, I can't help with that. Maybe someone else can give some advice if you need it.

Second, there must be a reason for the fact that your images show uneven illumination. I'm wondering whether either your LED lamp is mounted at a slight angle to the plane of the pages you are imaging, or whether one side of the page could be being affected by a reflection from the lamp? You could always, of course, do some tests imaging a plain sheet of white paper, if that might be useful.
db-inf
Posts: 1
Joined: 18 Oct 2022, 02:26
E-book readers owned: Kobo Forma
Number of books owned: 1000
Country: België

Re: Preserve color but make white background? (no OCR)

Post by db-inf »

You really do not need to be an image processing professional to get good images. ImageMagick is said to be well documented, but the tool is so powerfull that reading that documentation is unsurmountable. But there is a very usefull website with ready-made scripts, Fred's ImageMagick Scripts, from which I have let the textcleaner script loose on your two example images, with just the defaults, i.e. no options. Look for yourself.
Attachments
sample page scanned 2 textcleaner.jpg
sample page scanned textcleaner.jpg
Post Reply