Software Requests (Besides Dewarping)?

Anonymous1 · Post by **Anonymous1** » 16 Nov 2010, 15:27

Here you go: http://img602.imageshack.us/img602/9073/dscf9515a.jpg

Gerard · Post by **Gerard** » 16 Nov 2010, 17:42

thanks,
i am using http://pacific.mpi-cbg.de/wiki/index.php/Fiji
for the test

the content for this kind of scan is maybe detectable but, i think for an other kind of scan it would fail, i would introduce classes of book scanning (1 page scan, 2 page scan, scan is wraped)

here is the macro (open fiji, press ctrl+shift+n, copy&pase the macro, change the path of the image, in text window press ctrl+r to run the macro)

Code: Select all

run("Close All"); //closes all image
open("/home/gchoinka/Downloads/dscf9515a.jpg"); //change the path to the image
run("32-bit"); //convert the image to 32bit float values 
run("Median...", "radius=5"); //  reduce the noise 
run("Convolve...", "text1=[-1 0 1\n] normalize"); //runs an edge detection in x direction over the image, all vertical edge will get highlighted 
run("Square"); // runs Square over the image to flip the negative values to positive values   
run("Select All"); 
run("Plot Profile"); //shows the profile

so this are strong edges, i think they can always be thrown out

: result image, strong edges

this is the same image with contrast to the text

: the result image with better contrast

this is how the plot looks, the left and right edge are to hight

: plot of the sum in y direction; Plot of dscf9515a-2.png (2.39 KiB) Viewed 9347 times

just the left edge of the book

: just the book beginning from left

and the corespondent plot

: plot of the book beginning; Plot of dscf9515a-4.png (2.93 KiB) Viewed 9347 times

the bookspage starts at ~190 pixel, flowed buy a area without an signal up to ~220 pixel, then the texts starts

but for the text searching we need maybe an other algorithm

i hope it contributes something

nifem · Post by **nifem** » 16 Nov 2010, 18:58

[request]
I think we should just get the output right enough for the actual OCR software used. In my case, I just need an auto-deskewer for heavily distorted images (dual pages in 90Â° angle photographed from front with only 1 cam), or just some preprocessing, so the same deskewing parameters can be applied to all images. But just right enough for the OCR software to do it's job. Maybe this can be achieved with auto-aligning the pages and then batch deskew.

Anonymous1 · Post by **Anonymous1** » 16 Nov 2010, 19:10

Nice! The edge detection will make finding the seam a lot easier.

I'm still working on my algorithm, and it's going quite well. I'll post a fully functional demo soon.

Post by **daniel_reetz** » 16 Nov 2010, 21:38

Anonymous, you are on fire! Can't wait to see what you come up with.

Anonymous1 · Post by **Anonymous1** » 16 Nov 2010, 23:15

nifem wrote:[request]
I think we should just get the output right enough for the actual OCR software used. In my case, I just need an auto-deskewer for heavily distorted images (dual pages in 90Â° angle photographed from front with only 1 cam), or just some preprocessing, so the same deskewing parameters can be applied to all images. But just right enough for the OCR software to do it's job. Maybe this can be achieved with auto-aligning the pages and then batch deskew.

I was hoping ST had this, but it didn't. I had a perfect dewarping mask ready, and I couldn't apply it to multiple images (I don't want to hack around that project XML). I did some research on the dewarping, and it seems pretty original stuff, so it's not really implemented anywhere but ST. I don't code in C/C++, so I have trouble navigating through ST's sources...

For now, I think I'll just finish what I've started.

Anonymous1 · Post by **Anonymous1** » 17 Nov 2010, 15:31

So, for a progress update, I've got it down even further. Here is the output from that original picture (I cropped the edges, as they were throwing off my averages):

The algorithm is having trouble with the details near the middle, but that's because I'm using a bell-shaped smoothing curve. I'll fix that soon.

JonEP · Post by **JonEP** » 23 Nov 2010, 15:04

Dear Anonymous,

Univershul replied to your original query with a comment about the desire among many of us to have a "fixed content box" option for Scan Tailor (the suggestion that Dan highlighted for being "very astute"). Here's the link to the thread that Univershul was referencing: http://www.diybookscanner.org/forum/vie ... ?f=8&t=466. A similar issue is discussed here: http://www.diybookscanner.org/forum/vie ... ?f=8&t=633. If creating a "fixed box" option for ST is one effect of your coding project, I'd be very grateful to you. The most daunting thing about scanning for me right now is all of the time it takes to do post-processing. ST is a great aid in that, but I still find that I am put off by the prospect of dragging content box edges for all non-standard pages. BTW, one thing we haven't discussed on that front: what if the program were to find the mean distance between assumed content and the edge of the page (or image) in all four directions, and make the assumption that this is the edge of the book's standard content area? It could then apply that distance to all the pages that did not fit the mean, and perhaps provide a sort option that puts non-standard pages at the top of the list (ie., instead of sorting by height or width), allowing users to very quickly zoom through, accepting selected, all, none, etc...

Thanks,
jon

Anonymous1 · Post by **Anonymous1** » 29 Nov 2010, 20:33

Sorry @JonEP for the delayed reply; finals are painful...

I've never coded a single application in C++, other than a prime number generator. I work mainly with web design and Python. I could see if I could hack together a parser for the XML project file for Scan Tailor, as I finally got my content-recognition algorithm working (still refining it).

DIY Book Scanner

Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?

Re: Software Requests (Besides Dewarping)?