Page 1 of 1

YAPP

Posted: 02 Sep 2009, 23:45
by spamsickle
That's right, it's yet another page plucker, or post-processor, or something.

It's quick and dirty, and it only does one thing: It extracts a reasonable approximation of the page image from a typical DIY-scanned image.

It doesn't rotate, collate, rename, deskew, binarize, OCR, push, file, stamp, brief, debrief, or number. It's pretty robust -- I tried it on the "problem" images that were uploaded here, and it handled them okay. At the same time, it may get a little "slop" along with the page -- a bit of the opposing page, or the colored cover behind the pages. That doesn't bother me, as long as I get the part of the page I'm interested in reading. If it bothers you, this is probably not the software you want.

It's pretty fast. On my machine, it takes about one second per page. Pages of text, with proper margins, it does well. Pictures, usually not so well. I typically scan both the outer and inner covers of the books I process, and this software often screws both of those up. Even when it puts out an inferior "page," however, it usually keeps running.

I've included the source code, but it's not pretty. Like I said, it's quick and dirty. One of these days, I may go in and clean it up -- make some of the static variables local, take all those blocks of code that are duplicated with minor changes and massage them into proper subroutines with parameters, etc. -- but this isn't meant to be a commercial product or a showcase of my programming abilities. Feel free to modify it to suit your needs, and share your changes here; it pretty much suits my needs as-is.
YAPP.zip
(9.15 KiB) Downloaded 784 times

Re: YAPP

Posted: 07 Sep 2009, 19:25
by jradi
Nice Job! Works like a charm. If it would only descew, it would be perfect. I think it will help out with abbyy, thanks!