Scan Tailor

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Locked
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

I'm getting a little suspicious of the dewarping algorithm described by Bukhari et al in Dewarping of Document Image using Coupled Snakes. For one thing, it requires multi-scale gaussian blurs. That means it seems to need at least 10 gaussian blurs at different scales, per image. Then it needs to do a search in (x,y,t) space (t = gaussian scale). I'm getting the sense that this will be very slow. I'm also not convinced that the method would work with graphics in the text.

I'm looking at two other papers, by Lu et al: Perspective rectification of document images using fuzzy set and morphological operations, and Fu et al, A Model-based Book Dewarping Method Using Text Line Detection (also known as the CTM, or Coordinate Transform Model).

CTM and CTM2 (CTM with pages preprocessed to remove graphics) seemed to perform very well in this contest.

The CTM paper also gives a detailed algorithm for each step, with the exception of the first step, which is text line enhancement.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

FWIW, my advisor's model of the early visual system (ODOG, Oriented Difference of Gaussians) does multiscale spatial filtering as you describe. It's excruciatingly slow in both Mathematica and Matlab, and I can't really imagine that other languages are going to improve that. We have a supercomputing cluster of 8 machines to run it on, and that is also slow.

Now, we are not programmers, really, so this is just a little anecdotal support in your favor.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Yeah, it is a bit of a pain. The whole point behind doing multiscale blurs is that you don't know. a priori, how big or how small the features on the page are. That is, how high is a line. I think there are easier ways to determine that! I'm very encouraged by my work so far on CTM, so we'll see how that goes.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: Scan Tailor

Post by StevePoling »

rob wrote:Yeah, it is a bit of a pain. The whole point behind doing multiscale blurs is that you don't know. a priori, how big or how small the features on the page are. That is, how high is a line. I think there are easier ways to determine that! I'm very encouraged by my work so far on CTM, so we'll see how that goes.
Seems to me that the typesetters are going to stick with the same line height within a single book. Open up War & Peace to any page and you'll see it typeset at the same face & size as page 1. So, the hard maths to figure out line height is just done once per book.

Only time this wouldn't hold is an IEEE conference proceedings book where each author's paper is typeset in his own face & size. I figure ICASSP '85 should be just about the worst at that. Earlier, everything would be typewritten; later, everything would be submitted electronically. Also, tech books like that will be heavy in equations and math notation. Even then, there'll be some heavy Markov statistics going on. The probability that line height is X will be heavily conditioned upon whether the last page's line height is X. In War & Peace that conditional probability is 1. In ICASSP '85 (ferinstance), it'll be less than 1 as I've described, but still greater than random.

One thing that could be fun if you're a total math geek, is to build a statistical optimization that incorporates a page-layout grammar, a language model, and letter glyph model into one massive hidden-markov model (or Kalman filter) on. Math PhDs only need apply. (I.e. I can imagine doing something like this, but it's way beyond my skill set.)
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

:) I've done that sort of thing before, and it's not a process I care to repeat!
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

So I fooled around with the first part of CTM, which is text line enhancement, and text line detection. So far things look extremely promising. Here's a view of some pages that I processed. Admittedly, some of the parameters I used are dependent on the page characteristics themselves (especially line height), but anyway...

Step 1 is text line enhancement, which means doing away with the small details of the letters and getting blobs. This is done with morphological operators on the binary image of the last step of Scan Tailor. It is fast. For those who are interested, I close the image with a 100 pixel wide by 2 pixel high brick, followed by opening the result with a 100x16 brick.

Step 2 is text line detection. I use another morphological operator (hit-or-miss filter, 7x5 brick which has hits on the top half and middle, and misses on the bottom half) to find pixels in the image which form a horizontal line at the bottom of the text blob. Next, we find all the pixels that compose each line, and eliminate any line that isn't more than 1/2 a page wide. That might be modified to account for double-column pages. This part is somewhat faster than step 1.

Step 3 is text line approximation by polynomials. I use a 2nd-order polynomial y = a + bx + cx^2 to approximate each line. This means that each line consists of the three parameters a, b, and c. This tends to smooth out irregularities in the line while preserving the distortion. I tried higher order polynomials, but the results were worse. This step is extremely fast.

Step 4, which isn't part of CTM but I did it for fun, is parameter approximation by polynomials. Since the polynomials in the previous step describe a line going across the page, this step assumes that the parameters of the polynomials (a, b, c) vary from top to bottom of page with a second-order polynomial in y. That is, a = a(y), b = b(y), and c = c(y). This assumes that the distortion from top to bottom varies with y. It seemed to work, but I might not use it. This step is also extremely fast.

The first set of images shows a page with all text. The second set shows a page with a graphic at the top.
source.png
source.png (95.74 KiB) Viewed 13538 times
step1.png
step1.png (22.52 KiB) Viewed 13538 times
step2.png
step2.png (9.7 KiB) Viewed 13538 times
step3.png
step3.png (6.14 KiB) Viewed 13538 times
step4.png
step4.png (10.28 KiB) Viewed 13538 times
2source.png
2source.png (98.76 KiB) Viewed 13538 times
2step1.png
2step1.png (28.23 KiB) Viewed 13538 times
2step2.png
2step2.png (9.49 KiB) Viewed 13538 times
2step3.png
2step3.png (4.12 KiB) Viewed 13538 times
2step4.png
2step4.png (11.01 KiB) Viewed 13538 times
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Another quick update: I completed the margin estimation part. I also found this step to be extremely fast. For the nongraphics and graphics page, here are the results (the approximated lines are cut off at the margins):
step5.png
step5.png (7.18 KiB) Viewed 13537 times
2step5.png
2step5.png (5.1 KiB) Viewed 13537 times
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

Sadly, I'm going to have to develop my own undistortion step, since the algorithm described in the CTM paper is highly inscrutable. It makes very little sense, and indeed it is not clear how it is supposed to handle areas where there are no detected lines (as in the great blank area in the 2nd image above). Further, the algorithm describes a way to transform the image x,y into an undistorted transformed image x',y', which is usually the wrong way to do transformation, since it tends to leave gaps where some x',y' coordinates don't get mapped from an integral untransformed x,y.

But, I have half a plan. I'm just waiting for inspiration to strike before I get the other half of the plan :)
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Scan Tailor

Post by rob »

So I scribbled lots and lots of equations on paper, and now I have steps 6 and 7 done.

Step 6 is to take the margins and straighten them, and then recompute all the lines and their approximations.
Step 7 is to uncurl the curliness.

So here are some sample images, again one without graphics, and one with, showing the original, the straightened margin, and uncurled. At this point, the uncurled version is probably good enough to use (the results are completely square), but I want to account for more severe curling. That means stretching out (in the y-direction) the areas that were curled up.
orig.png
orig.png (95.74 KiB) Viewed 13523 times
step6.png
step6.png (98.64 KiB) Viewed 13523 times
step7.png
step7.png (96.42 KiB) Viewed 13523 times
2orig.png
2orig.png (98.76 KiB) Viewed 13523 times
2step6.png
2step6.png (100.86 KiB) Viewed 13523 times
2step7.png
2step7.png (100.02 KiB) Viewed 13523 times
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
jrichards
Posts: 22
Joined: 04 Mar 2014, 00:52

Re: Scan Tailor

Post by jrichards »

How severe of a curl will this correct? If you are familiar with snapter will it do the same thing that it does to fix the page curvature of the book?
Locked