I'm from Berlin, Germany, I came across this site by accident and I'm thrilled and relieved at the same time, thinking: ok, I don't have to start from scratch, some guys did it already, phuhh
Your work on the book scanner development and instruction is just awesome, I'm impressed, really very good work. And what I read about ScanTailor looks also very promising, I have to test this soon, absolutely. I were really close to start the work on such a program for ocr preprocessing (even have some quick-and-dirty code for adaptive contrast correction), but I found your page and tada! someone did it already, it's like a christmas present ...
Actually, I started to improve my scanning situation today, because the pile of books todo grows ... and my time for this shrinks
For introduction, I have to say, that I'm in book scanning since years now but I did it only from time to time. Therefore, a flatbed scanner was sufficient for me. Some time ago, I also tried it with digital camera images of double sides, but the quality was not very convincing in contrast to that of a real scanner. The minor problem was the bad lighting situation and the major disadvantage is the deformation of the images by the curvature of the pages when the book is opened at 180 degrees.
As mentioned, I started to construct a first digital camera based book scanner 'sketch' before I found your page today ... the frame is made of Styrofoam parts and I'll use a Canon A640 for image aquisition. For this camera there exists software for linux (98% Linux-User
) to remote control it and to make images periodically, e.g. every 5 seconds. So I can limit myself to page flipping which will save a lot of time (hopefully). But my final goal is to have this image aquisition fully automatically done, so I can do other things in the meantime ... Automated page flipping is probably the most difficult mechanical problem here (especially with different paper material and air conditions in mind).
I thought a lot about the whole procedure of document digitalization in general and also with the idea of information freedom in mind. So, your open-project with free and complete documentation is absolutely the way to go, also in my opinion. When a technology spreads and becomes widely available, it will force changes in regulations and society ...
I put my thoughts for the mid- and long-range goal for a solution of the digitalization task of printed (hard-copy) documents:
On the hardware side, as mentioned, full automation with a complete documented hardware in some kind of 'open-source project'. Afaik, this is (partly) already solved by commercial scan robots but they are obiously not 'open', i.e. available for us/me. You already know those, probably:http://www.youtube.com/watch?v=hlOQuuLYavYhttp://www.youtube.com/watch?v=-oOXXpxzETAhttp://www.youtube.com/watch?v=UyB5c3S4vzc
and so on ...
On the software side:separate post: the future of ocr ?
I'm a graduated computer engineer with some experiences in image processing and math. Atm, I study scientific computing (math and programming of super-computers aka 'number-crunchers'). Besides of the ocr problem, I'm waiting for a good device for reading ebooks. 'good' means here an acceptable display size (around 15x20 for a proper font size) and low weight and price. Recent ebook readers (with eInk) are still too small and tablets are just to heavy and expensive.
keep up the good work !