DIY Book Scanner

Posted: **12 Jan 2011, 12:09**

Yeah, the smaller sensor (meaning only non-full-frame) cameras are actually better for book scanning in a sense, because it's easier to get deeper DoF. The T2i/550D is on my shortlist for a single-camera scanner.

Posted: **12 Jan 2011, 12:13**

As that's all I have, do you have any recommendations for a single-camera build? I currently have it mounted onto an overhanging table edge and cover the book with a single pane of plexiglass (so it's basically a high-speed flatbed scanner), but I'll have to scan some books from the early 1800's for a friend so my current design won't work.

I was looking into that build with the rotating cradle above the camera, but I was just wondering if there are any more good builds for this.

Posted: **12 Jan 2011, 14:04**

Anonymous wrote:As that's all I have, do you have any recommendations for a single-camera build?

Dan did a single camera scanner earlier, and posted it on instructables: http://www.instructables.com/id/Bargain ... board-Box/ .

---
If we consider the light captured by the camera to be an "infinite weighted sum of an infinite number of sinusoidals" (Forsyth and Ponce, Computer Vision: A Modern Approach, sec 7.3.0) and, if we know a little about how the camera captures different frequencies of light, we may be able to do some heavy-lifting in software.

Consider the idea to use three lights (RGB) to cast shadows; if we choose these lights such that they are tuned to the sensitivity of the camera's imaging sensor, we may be able to get some high-fidelity information by inspecting the R, G, and B outputs of the camera. We may even be able to do the same (or similar) with a single overhead light (an "ideal" white light, not yellow etc.). I'm curious, can we extract the curvature from the change in distribution of frequency over an image?

As for casting a laser across the page; we may choose to attenuate (with a high-pass filter, etc.) the G output (of the RGB output of a camera) only, allowing us to use one picture instead of necessitating two.

As an aside, we may consider shooting the laser towards the margins on the top and bottom to reduce the potential loss of fidelity from a one-camera method.

---
A separate tact from Fourier frequency analysis might involve some rudimentary machine learning. We are currently ignoring the wealth of information in our datasets, currently very little is done to take advantage of the fact that we are taking many pictures of the same book, with only one page advanced being the delta between.

Given we have a constant lighting scheme, camera placement, we have the full set of images taken from one book, and some criterion to judge whether a presented dewarped model is accurate for some page, we may use one of the published methods (SKEL, SEG, CTM) to dewarp a single page, then refine that model (and the delta between models) using subsequent pages.

In other words, we would dewarp pages from front to back (or w/e), and simultaneously (or after-the-fact) check whether our dewarping model ("3D" model) is consistent between pages. If they're not, we can go back and tweak the dewarp model for "better" results.

Posted: **12 Jan 2011, 14:12**

welcome, Matti.

Anonymous, off the top of my head, no, though I can certainly spend some time digging through my memory of what's been done so far. I don't think Antoha-SPB's design is optimal for old, fragile books like you're describing. Something like what Matti linked to -- a simple cradle and a piece of glass with improvised lighting -- may represent the most bang for the least effort.

Posted: **15 Jan 2011, 17:18**

Spamsickle and I just spent a couple hours trying out some of these ideas and getting images for everyone to play with. Some really promising results! I hope to be posting them by tonight. Big thanks to Spam for coming over and helping out. I could have delayed weeks without his help.

Posted: **16 Jan 2011, 15:18**

daniel_reetz wrote: Spamsickle and I just spent a couple hours trying out some of these ideas and getting images for everyone to play with. Some really promising results!

Fantastic!

It's really great to see all the effort, enthusiasm and good ideas here. I wish I could participate more, but right now, school, work and family responsabilities take up about 150% of my time

. Maybe in four or five months I'll be able to get involved in programming, which is something I do have experience in.

vitorio wrote: And whether you're using stereo pairs or feature extraction or a Kinect, you still end up with a point cloud that you have to turn into a mesh and then straighten out, right?

Sounds logical---though is a mesh really the right model for a page? (Maybe those working on dewarping in Scan Tailor can fill us in on this?) There are open source mesh programs I think might work with point clouds---at least one of them, OpenMesh, can be controlled from the command line. http://meshlab.sourceforge.net/wiki/ind ... hlabserver

Here's my shot at an algorithm for a setup with a single, good consumer camera snaping photos in sequence, while someone turns the pages (I'd guess three or four images per second would be enough--can most cameras do that?)
- Use a combination of page edges, text lines and other lines in page content to de-warp three or four images of every page. Compare the results. If they're not similar enough, modify parameters until they are. Then use the picture of the page at its flattest for the final image.
- The page turner would just have to make sure he/she gets his/her hands out of the way for a moment so that at least some images would be complete, and the de-warping would have to take into account the presence of hands/fingers in most of the images.

I also think atarkri's idea of taking into account the data set for a whole book is work considering.

Regarding software: considering the variety of methods being proposed, and the overlap of the algorithms involved, wouldn't be a good idea to make the sofware as modular as possible? There could be one module for page modeling, and another for image dewarping, and both could be re-used for practically any of the methods proposed here. (Is this already the way ST is being programmed?)

Regarding stereo imaging:

daniel_reetz wrote: it's not clear that there is presently any implementation of this in Decapod, but it was the original idea

Hmmm... Though the images in http://wiki.fluidproject.org/display/fl ... User+Guide look dewarped, no? BTW, I tried to look at the source code, but something is wrong with their repository... Has anyone here actually installed and tried Decapod?

Posted: **16 Jan 2011, 21:32**

Hello andrewgreendf,

Yes, I have tried Decapod it is so far a better scanning software, though the group are still on their new release.
version 0.5 to 1.0
http://cobecoballes-linux.blogspot.com/ ... anner.html

So if you want to try it, you can download the code and change the script a little, gphoto2 latest release version
is 2.4.10 and it has a "--capture -video " features -quite interesting

Thanks

E^3

Posted: **17 Jan 2011, 20:38**

Hey, Daniel, SparkFun has 32 individually-addressable RGB LEDs on a single 1 meter adhesive strip, might be fun for RGB differenced lighting: http://www.sparkfun.com/products/10312

I played with some software over this weekend and have a few failures to report.

First, in the 90's, Steve Mann came up with some neat software called Video Orbits. It turned frames of video into a single composite image, much like we'd do with Hugin or Photosynth or other panorama stitching software, but without feature extraction, using novel image processing techniques (comparametric equations and chirplet transforms). He also used his algorithms to do HDR before it was called HDR. His techniques should mean a fixed video camera could look over a book and you could turn that into a high-resolution composite image. I've gotten fantastic results from using it where traditional panorama software failed.

I took my HD camera and looked at a book with it, but I couldn't get a composite generated from frames dumped from HD video, however. I'm not sure if it's the resolution of the images, or if the camera motion broke the software's expectations, but it just wasn't happening. There's more recent, GPU-accelerated versions of the code in the OpenVIDIA library, but I'd have to write code to access it; their OpenVIDIA workbench doesn't provide a direct UI for using it.

Second, Photosynth does things more traditionally: it does feature extraction, and then it spatially references those features as it figures out the 3D panorama and gives you a point cloud. I took the same extracted images and ran them through both the stock Microsoft Photosynth software, and the Microsoft Image Composite Editor, and neither were able to stitch the images together into something coherent. The point cloud was a linear mess, and the composites were often completely wrong.

I'm pretty disappointed that Photosynth didn't "just work," it'd be a cheap and easy way to get some sort of point cloud going.

If the arbitrary camera motion turns out to be the issue, a fixed camera panning over a platen might work better.

I talked with some folks here about point cloud -> mesh -> straightening, I'll summarize that and post on it next.

Posted: **17 Jan 2011, 22:01**

awesome. I'm trapped doing a SIGGRAPH paper at the moment, but I hope to report on the work I did with Spamsickle ASAP.

Posted: **19 Jan 2011, 09:13**

(First at all, sorry for bad english at all my posts , I am no english speaker)

Is the following another possible method for cope with dewarping? I dont know if it is factible , practical or if it was just commented at this forum

If we have the book open against table and I assume that the table is the XY plane, and the center of book spine matches with Y axis at table, then I think is a good approximation that page curvature at each point of the two visible pages not depends of Y coordinate for each point. Then , assuming this approximation, we have two cameras, one of them its "normal" camera for capture two pages each photo (it is placed, of course, above open book), and another camera that it is placed at one point of Y axis and a little far away from book (for less perspective distortion, but this problem is software-corrigible if camera is near) and this camera pointing with its axis matching with Y axis (and towards book , of course). Then the two cameras make one photo each camera at same time.

Photos of second camera looks more or less like this image , but with the pages open:
http://image.shutterstock.com/display_p ... 350721.jpg

Perhaps is better if the background for "secondary photos" is totally black, or white (with regular ilumination too in this case?)... I dont know, but goal for this it would be achieve better contrast (?) between the two visibles pages sections at "secondary photos" and background. ("Visibles pages" is a term regarding camera one photos)

Then , Can a program that analyzes each secondary photo detects the two curves that "draw" the two visibles pages (one curve for each page) and then, using this curves and the main corresponding photo for this secondary photo, dewarps this main photo?

I dont know if for my english my idea its at least understandable, but I hope it is.

DIY Book Scanner

Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book