The flattening, by the CZUR software, is significant though I guess it is not to your expectation of perfection for a $400 device.
The worst examples I see on Archive.org are either scans from microfilm, often from India, and a lot of the Google Books materials. A lot of that is positively illegible. I don't think it is the goal of Archive.org to provide print-quality masters of public domain material just so others can grab them and publish them in some for-profit venture. Library of Congress tends to make high-quality scans for the titles they handle but not everyone is at that level.
And, yes, it is a 24 megapixel camera on a stand which is little more than a good web cam. That is the kind of thing you get at this price point. But I have used the 4- and 5-figure scanners in libraries. The image quality is good (though not dramatically better) but they don't even try to flatten the curled pages on the edges.
The end purpose of a scan (and the budget) will dictate the equipment that must be used.
_____
I've used ScanTailor Advanced a lot and it has a number of annoyances that I have not figured out how to work through. Some of it is a clunky interface (not better than CZUR, just different). To use it you really have to go through all of its stages though that is not clear until you try to make an output and it complains that something like margin settings are not done.
On the topic of the margin settings, its defaults are never what I want but I don't see a way to alter them to improve my workflow. I don't want it to add space around the page images so I usually have to check two boxes and have this apply to all pages and then execute for the run of pages. Short of digging into the source code and compiling, there should be a way to set these defaults.
For output you have 1-bit TIFF for black and white or color TIFF for grayscale/color. The latter files are gigantic and hard to work with for building a PDF. So I usually have to convert them to PNG (when they are above 1 MB) in ImageMagick. I have some commands and scripts I use often for this sort of thing using GNU Parallel to maximize the process:
Code: Select all
gtime -f "\n%E" parallel --bar convert {} {.}.png ::: $(find . -type f -name '*.tif' -size +1M)
On a text-only book, the 1-bit TIFF are handy for making a rough small PDF.
As others have requested before, it would be nice if ScanTailor Advanced could ingest a PDF without having to use another tool to export the pages to files. Likewise it would be nice to have PNG as an output option.
Another aspect where I have to use another tool (Photoshop droplets) is to crop a bunch of page spreads before I can work with them in ScanTailor Advanced. The Photoshop droplet method is frustrating because in nearly 100% of the jobs I run there is one random file that does not get processed and I have to manually acknowledge the error and manually run the last image. Adobe products have other issues, including running OCR in a single thread (Acrobat "Pro") so only one processor core is used even though my computers have had multiple cores for 20 years or more.
So if we want to nitpick about tools, there's plenty of room to do so for free and expensive programs and equipment. But, as the old saying goes, it is a poor craftsman who blames his tools.
_____
Within reasonable time limits, I can provide requested examples of files for certain kinds of scans. I expect that any image inserted in this forum will be resized some unless it is a link to something on a DropBox or similar storage. Of course those items tend to be ephemeral so someone looking at this thread a couple years down the road will probably not have the source image in question, especially if it is large and the space needed for something else.
When I say that the CZUR devices are "good", my usage is colored by my experience in the book trade as a collector and antiquarian bookseller. Good is about 6/10. Very Good 8/10. Fine 10/10. So I am not saying they are perfect but do impressive work for the price point and size of the devices.
You are free to use them or not. If you do use them, you may want to refine the technique to get the most out of them that they can offer.
As a book collector, I am not willing to guillotine the spines to run the sheets through a ScanSnap. I also find that I want to be careful just how wide I open the bindings so that I am not adding a lot of damage in the process of scanning a book for the purpose of reading or making a searchable file.
James