Alternative Software Workflow

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Re: Alternative Software Workflow

Postby jakegaisser » 27 Mar 2010, 00:07

well what I am doing now is the renaming and rotating with the script + imagemagick and using JPEGCrops

can scan tailor synchronize crops across pictures like JPEGCrops can? because that is why I like JPEGCrops so much.... it is so very fast to crop my images this way.
jakegaisser
 
Posts: 63
Joined: 01 Mar 2010, 17:09

Re: Alternative Software Workflow

Postby spamsickle » 27 Mar 2010, 11:41

No, the ability to synchronize content selection across a range of pictures as JPEGcrops does is a much-requested feature in Scan Tailor. I've looked into implementing it, but still haven't gotten my head around enough of Scan Tailor's staging/paging/filtering system to be comfortable taking key to code.

I have noticed that most of the time, when Scan Tailor's content selection fails, it's really a failure in the page splitting step. If I go back and manually split the page correctly, the content selection will usually crop what I want. The page-splitting failure is usually one of two things: either it doesn't split the image at all, treating the whole thing as a single page, or it splits it at the outer edge of the page, which has the same practical effect in the content selection phase -- the entire book image is treated as though it contains potential content.

In the cases where it splits the page at the outer edge, I've confirmed Tulon's observation that it finds the correct place to split as well as the incorrect one, and simply fails to choose the correct one. He has suggested that a Fourier transform to find high-frequency components (like text) near the split candidates would improve the choice, and this seems likely. In our images, the space between the candidate and the edge of the image is usually a mass of darkness; in flatbed scanner images, they'll probably either be a mass of darkness (if the book was scanned with the scanner cover open) or a mass of brightness (if the book was scanned with the scanner cover closed). The relative lack of high-frequency components could eliminate this line as a candidate, and improve the page splitting performance.

It should be easier for me to figure out where to add such a Fourier transform routine than how to approach adding selection synchronization, so maybe I'll concentrate on that next, and hope the experience with the former will teach me enough to tackle the latter.
spamsickle
 
Posts: 577
Joined: 06 Jun 2009, 23:57

Previous

Return to Tutorials/How-To's

Who is online

Users browsing this forum: No registered users and 1 guest