How to process images with lot's of pictures?

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Post Reply
kloosen
Posts: 9
Joined: 26 Feb 2022, 14:55
Number of books owned: 0
Country: a nice one

How to process images with lot's of pictures?

Post by kloosen »

Hello,

I am in the process of scanning climbing guidebooks.
These books have tons of pictures all over the place on a single page, often at the very edges or even corners of the pages.
Using ScanTailor and trying to process them turns out to be a huge effort, easily costing 10h+ per 300 pages.
The Mixed output needs to be manually tweaked 99% of the time which is extremely time consuming since a lot of the pictures do not have straight edges but are drawn pictures. Here is an example how it might look.
But since the lightening conditions are far from perfect, using the Color output posterized to between 8 and 16 colors yield either too bright of an output, losing a lot of color differentiation (e.g. green becoming black, black text being colored) or looking extremely JPEG'ed with big color spots all over the place.
Equalizing illumination seems to be a necessity because otherwise even slight shadows get amplified into big grey areas

Following are examples which are by no means the worst of the bunch, maybe even better ones. There are books with many more colors per page. Of course the fact that the paged were not completely flat on the glass contributes a ton to the shadow problem which I try to mitigate with a slight design change. But especially thick softcover books seem to warp quite a lot and pressing against the cover pages does not compress all the way through the scanned pages.

How would you process such images?
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: How to process images with lot's of pictures?

Post by cday »

You could try using a Levels adjustment: when colour can be sacrificed, most effective on an image first converted to grayscale.

But your priority should probably be to obtain more even illumination.
kloosen
Posts: 9
Joined: 26 Feb 2022, 14:55
Number of books owned: 0
Country: a nice one

Re: How to process images with lot's of pictures?

Post by kloosen »

Sadly the colors are critical, though I try to reduce the color level as much as possible, but at some point there will be a lot of conversion into colors I do not want.
Getting better illuminated scans/images is a no-brainer but this is what I have to work with right now.
For better illumination I have to find a way to really flatten the page being scanned which is not possible with the softcover books I used to scan so far, especially if the are thick which is when the pages tend to warp. The amount of compression I would have to apply flexes the glass plate not only to a level which is visible on the image, but also putting parts out of the focal plane and... well, making me anxious that the glass might simply shatter.

Seems this is a case of the better the scan the better the outcome big time.

On a side note, how should I compress the colored images. Since they are not natural photographs with somewhat smooth transitions high compression ratios of JPG and especially JPEG2000 yield quite mushy images and ZIP is not doing much.
Is there any good compression for images where contrast is quite important that I should use here? Lossy is not a problem as long as quality doesn't go completely down the drain.
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: How to process images with lot's of pictures?

Post by cday »

kloosen wrote: 25 Apr 2022, 07:36 Sadly the colors are critical, though I try to reduce the color level as much as possible, but at some point there will be a lot of conversion into colors I do not want. Getting better illuminated scans/images is a no-brainer but this is what I have to work with right now.

For better illumination I have to find a way to really flatten the page being scanned which is not possible with the softcover books I used to scan so far, especially if the are thick which is when the pages tend to warp. The amount of compression I would have to apply flexes the glass plate not only to a level which is visible on the image, but also putting parts out of the focal plane and... well, making me anxious that the glass might simply shatter.
It looks as if you really need to address the illumination and capture of an image close to the edge of the page before proceeding much further. I'm not sure whether you have said how you are imaging pages, but there are quite simple arrangements that others have described and used to capture one page at a time close to the edge of the page, although for a paperback that can't be opened wide, the more elaborate 'V'-platen scanners are possibly the only near-optimum solution. Ultimately, you might have to use destructive scanning to obtain good results, if you are prepared to sacrifice the originals, or if you could obtain another set.

On a side note, how should I compress the colored images. Since they are not natural photographs with somewhat smooth transitions high compression ratios of JPG and especially JPEG2000 yield quite mushy images and ZIP is not doing much.
Is there any good compression for images where contrast is quite important that I should use here? Lossy is not a problem as long as quality doesn't go completely down the drain.
If the colours on the original pages are individually uniform colours, rather than continuous tone colours as in a normal photograph, there are other file formats and compression methods that might be more effective, although in practice it might be hard to better the size of a JPEG or JPEG 2000 image file for a page containing text.

I'm not familiar in any detail with those options, but if you have a good quality image that you can use as a test (possibly not one of your present images), you might look at the PNG format, and also at the TIFF format using some of the many available compression options. Some compression methods encode areas of uniform colour much more efficiently than JPG compression, one example although I have never used it being RLE (Run Length Encoding). You could also maybe search online for more information on image encoding methods.

Note, though, that if you significantly reduce the number of colours in an image, the quality of the text in the image is likely to be significantly reduced, as the edges of text in a colour image with a normal colour depth are normally smoothed or 'anti-aliased' using intermediate shades of gray. You can see the effect if you zoom in on text characters in a colour image containing text that appears to have a smooth outline. A more complex file format supporting 'mixed raster format' or the use of Adobe's ClearScan (not the current name) would be required to overcome that.
Post Reply