Software Requests (Besides Dewarping)?

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

User avatar
Gerard
Posts: 154
Joined: 17 Oct 2010, 07:15
Number of books owned: 0
Location: Berlin (Germany)

Re: Software Requests (Besides Dewarping)?

Post by Gerard »

thanks,
i am using http://pacific.mpi-cbg.de/wiki/index.php/Fiji
for the test

the content for this kind of scan is maybe detectable but, i think for an other kind of scan it would fail, i would introduce classes of book scanning (1 page scan, 2 page scan, scan is wraped)



here is the macro (open fiji, press ctrl+shift+n, copy&pase the macro, change the path of the image, in text window press ctrl+r to run the macro)

Code: Select all

run("Close All"); //closes all image
open("/home/gchoinka/Downloads/dscf9515a.jpg"); //change the path to the image
run("32-bit"); //convert the image to 32bit float values 
run("Median...", "radius=5"); //  reduce the noise 
run("Convolve...", "text1=[-1 0 1\n] normalize"); //runs an edge detection in x direction over the image, all vertical edge will get highlighted 
run("Square"); // runs Square over the image to flip the negative values to positive values   
run("Select All"); 
run("Plot Profile"); //shows the profile 
so this are strong edges, i think they can always be thrown out
result image, strong edges
result image, strong edges
this is the same image with contrast to the text
the result image with better contrast
the result image with better contrast
this is how the plot looks, the left and right edge are to hight
plot of the sum in y direction
plot of the sum in y direction
Plot of dscf9515a-2.png (2.39 KiB) Viewed 9347 times
just the left edge of the book
just the book beginning from left
just the book beginning from left
and the corespondent plot
plot of the book beginning
plot of the book beginning
Plot of dscf9515a-4.png (2.93 KiB) Viewed 9347 times
the bookspage starts at ~190 pixel, flowed buy a area without an signal up to ~220 pixel, then the texts starts

but for the text searching we need maybe an other algorithm

i hope it contributes something
nifem
Posts: 11
Joined: 04 Mar 2014, 00:53

Re: Software Requests (Besides Dewarping)?

Post by nifem »

[request]
I think we should just get the output right enough for the actual OCR software used. In my case, I just need an auto-deskewer for heavily distorted images (dual pages in 90° angle photographed from front with only 1 cam), or just some preprocessing, so the same deskewing parameters can be applied to all images. But just right enough for the OCR software to do it's job. Maybe this can be achieved with auto-aligning the pages and then batch deskew.
Anonymous1

Re: Software Requests (Besides Dewarping)?

Post by Anonymous1 »

Nice! The edge detection will make finding the seam a lot easier.

I'm still working on my algorithm, and it's going quite well. I'll post a fully functional demo soon.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Software Requests (Besides Dewarping)?

Post by daniel_reetz »

Anonymous, you are on fire! Can't wait to see what you come up with.
Anonymous1

Re: Software Requests (Besides Dewarping)?

Post by Anonymous1 »

nifem wrote:[request]
I think we should just get the output right enough for the actual OCR software used. In my case, I just need an auto-deskewer for heavily distorted images (dual pages in 90° angle photographed from front with only 1 cam), or just some preprocessing, so the same deskewing parameters can be applied to all images. But just right enough for the OCR software to do it's job. Maybe this can be achieved with auto-aligning the pages and then batch deskew.
I was hoping ST had this, but it didn't. I had a perfect dewarping mask ready, and I couldn't apply it to multiple images (I don't want to hack around that project XML). I did some research on the dewarping, and it seems pretty original stuff, so it's not really implemented anywhere but ST. I don't code in C/C++, so I have trouble navigating through ST's sources...

For now, I think I'll just finish what I've started.
Anonymous1

Re: Software Requests (Besides Dewarping)?

Post by Anonymous1 »

So, for a progress update, I've got it down even further. Here is the output from that original picture (I cropped the edges, as they were throwing off my averages):
Image
The algorithm is having trouble with the details near the middle, but that's because I'm using a bell-shaped smoothing curve. I'll fix that soon.
User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: Software Requests (Besides Dewarping)?

Post by JonEP »

Dear Anonymous,

Univershul replied to your original query with a comment about the desire among many of us to have a "fixed content box" option for Scan Tailor (the suggestion that Dan highlighted for being "very astute"). Here's the link to the thread that Univershul was referencing: http://www.diybookscanner.org/forum/vie ... ?f=8&t=466. A similar issue is discussed here: http://www.diybookscanner.org/forum/vie ... ?f=8&t=633. If creating a "fixed box" option for ST is one effect of your coding project, I'd be very grateful to you. The most daunting thing about scanning for me right now is all of the time it takes to do post-processing. ST is a great aid in that, but I still find that I am put off by the prospect of dragging content box edges for all non-standard pages. BTW, one thing we haven't discussed on that front: what if the program were to find the mean distance between assumed content and the edge of the page (or image) in all four directions, and make the assumption that this is the edge of the book's standard content area? It could then apply that distance to all the pages that did not fit the mean, and perhaps provide a sort option that puts non-standard pages at the top of the list (ie., instead of sorting by height or width), allowing users to very quickly zoom through, accepting selected, all, none, etc...

Thanks,
jon
Anonymous1

Re: Software Requests (Besides Dewarping)?

Post by Anonymous1 »

Sorry @JonEP for the delayed reply; finals are painful...

I've never coded a single application in C++, other than a prime number generator. I work mainly with web design and Python. I could see if I could hack together a parser for the XML project file for Scan Tailor, as I finally got my content-recognition algorithm working (still refining it).
Post Reply