BookScanWizard 2.0

Discussion about Steve DeVore's Book Scan Wizard, a power-user package to automate scan processing.

Moderator: peterZ

User avatar
kempelen
Posts: 18
Joined: 07 Feb 2012, 18:19
E-book readers owned: -
Number of books owned: 1000
Contact:

Re: BookScanWizard 2.0

Post by kempelen »

Hi Steve!

First, thank you for the great tool! While doing my first tests, I wrote a quick intro in Hungarian to: http://www.djvu.hu/konyv-digitalizalas/bookscanwizard

;)

My questions:
(I'm using Linux, Ubuntu, BSW 2.0.1a)

* "Show tips" does not accept that I uncheck it. It still shows every time.
* It would be excellent to remember last used folder and start with that in File - Open. Beginners do open the same thing many times.
* A list of recent files in File menu to reopen them would be very useful too. Beginners... ;)
* Faster previews, that you mention above, I agree. I tried the 0.25 preview size but didn't really notice a difference, is there?

Most important:

Is there any way to clean small "standalone" pixels from the page? Other than playing with "bw" threshold individually on each page? A built in filter, or a PipePNG recommendation? I tried "Levels" and "AutoLevels" but didn't really help, or I used wrong parameters. (By the way, AutoLevels created a command with 3 numbers and then complained about it not being an even number of parameters!)

Is there any way to make letters better "filled"? If I just do a threshold, letters are not as nice as in ST, but I don't know what more filters I could try or what more ST does - I'm always amazed on its clean results. I attach a pic, and here you can see the scan and ST and BSW output files. (Excuse me if I'm comparing with totally wrong settings! And I know, the original is also horrible, but it's an old book.) E.g. compare "e" and "t" on second line.

Full size files for the attached example:
http://c64.rulez.org/~lion/bookscanner/bsw/
Attachments
Example image
Example image
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: BookScanWizard 2.0

Post by steve1066d »

Kempelen,

Thanks for your posting. I like hearing that people are using the software. And thanks for the Hungarian intro!

I think there's a general problem where Linux isn't saving preferences.. I'll look into that. And I agree having a list of recent files would be a good idea too.

As far as the speed of previews, sometimes it is rather slow depending on what processing you are doing. If you want to post your script I can see if there's something that could be done to help.

As far as the standalone pixels, you really need a command called Despeckle or Denoise operation. Thats actually been on my list of things that I need to do. I'll see if I have time this week to work on that.

I've also been working on some filters that do a better job of converting to black and white, but from my testing I don't think it will be as good as ScanTailor. One option is to do the cropping and and other transformations in BSW, then use ScanTailor to convert it to black and white. I've also looked at integrating the command line version of ScanTailor, like with the PipePNG command, which could work well.

Give me a week or so and I'll try to address at least some of these issues.
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
kempelen
Posts: 18
Joined: 07 Feb 2012, 18:19
E-book readers owned: -
Number of books owned: 1000
Contact:

Re: BookScanWizard 2.0

Post by kempelen »

Hi Steve!

Ok, thank you for answers!

The program creates the following preference file on exit:

~/.java/.userPrefs/_!'4!~@"0!#4!cw"v!(`!cg"j!'`!~g"v!()!~w"l!#4!}g"v!'8!aw"z!':!}@"u!(c!a@"6!'%!cg"k/perfs.xml

Maybe in PerfsHelper

private static final Preferences prefs =
Preferences.userRoot().node(PrefsHelper.class.getPackage().getName());

you could try userNodeForPackage(..) instead?

http://docs.oracle.com/javase/7/docs/ap ... ang.Class)

(Just guessing, I don't know Java.) (However even in that ugly name, it could load it next time I guess, because if I delete, it creates on the same name again upon exit. I hope that name is not normal.)

Thanks,
Ferenc
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: BookScanWizard 2.0

Post by steve1066d »

I just uploaded a new version of BSW (2.0.2), that addresses some of these issues.

It now keeps track of the last directory and defaults to that.

It should save preferences under Linux now.

I added a new command: MedianFilter. This will get rid of noise by scanning an image, and replacing each pixel with the median of the values surrounding it.

As I was experimenting I also found out that it is helpful to right click on the image and select "Auto RGB Levels". This is because there are some darker spots on the image that are yellow brown from the old paper. The AutoRGBLevels will sort of adjust the background to white, which seems to help.

Here's an of some filter commands that seem to work well for that page.

Code: Select all

#Calculated by using right click and choosing "Auto RGB Levels"
Levels = 20.4,73.3, 19.6,70.6, 10.2,54.5
Color = gray
MedianFilter = 5
Scale = 2
Color = bw 49
One trick to figure out the appropriate level (bw 49) is to use the filter dialog in the viewer. Move the black level to 100, then move the white level down to the right spot to show the page best. Note what the white level is, and use that in the color command.

Here's there results:
Attachments
24in.png
24in.png (2.86 KiB) Viewed 14822 times
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
kempelen
Posts: 18
Joined: 07 Feb 2012, 18:19
E-book readers owned: -
Number of books owned: 1000
Contact:

Re: BookScanWizard 2.0

Post by kempelen »

Hi Steve!

That image looks excellent, congratulations and many thanks. I tried with your example and this, gives wonderful results!

I can confirm it remembers the "tips" checkbox and last folder used. Config file is saved in net/sourceforge/bookscanwizard/prefs.xml, looks OK.

Problems:
When I try Auto Levels RGB, it gives:

Levels = 9,4,16,9, 9,4,17,3, 7,8,15,7
or
Levels = 58,69,4, 51,4,63,9, 32,9,45,1

(And then error about not even number of parameters.)

I suspect some of those comma "," should be dot "."? Hungarian uses "," for decimals, so you need a kind of Culture Invariant double-string conversion.

And "Color = gray" kills the picture for me (makes fully white), so I didn't use that before either.

I uploaded the bsw file just to check if some other thing causes "gray" never work for me. Please ignore the many Persp fixes and Crop, may look a low quality scan work. :-D I would rescan if the book wasn't so rare and expensive.

http://c64.rulez.org/~lion/bookscanner/bsw/

What's the relation of Rotate and Perspective? Does Perspective fix rotation too, because of its nature? Or should do Rotate first in the script? (E.g. -88.5 would fix my left pages. But how does the two-coordinate parameter Rotate command work? I think if I give it a rectangle, it does not do small-degree rotation, does it? Based on diagonal.)

I think my scans are mainly only rotated, as I try to make camera straight to the book. E.g. on 2200px ~15cm line length I measured 7 pixels of first line and last line on extending perspective direction, which was about half of "M" letter height. Do you think that worth perspective correction? (Not the book in the example, I didn't check in this!)

Thank you for the new features and the help, I appreciate a lot!
Ferenc
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: BookScanWizard 2.0

Post by steve1066d »

For now I changed the code to force a period (.) as the decimal separator. In the future, I'll change it to be smarter and use the locale specific way of separating a list of numbers (like a semicolon).

I uploaded version 2.0.2a

I tried your config with the Color = gray uncommented and it seemed to work ok for me. Maybe there is something specific with your source images. If you upload one of your source images I'll check it out.

Though actually, the only reason for the Color=gray is to speed up the rendering a bit. It shouldn't affect the quality.
Steve Devore
BookScanWizard, a flexible book post-processor.
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: BookScanWizard 2.0

Post by steve1066d »

As far as your question on perspective and rotation, Fixing the perspective will also fix the rotation, so there's no need to fix the rotation first.

As far as if perspective correction is worth it.. it probably depends on what you are doing with the book afterwards. OCR software tends to need things very straight, though there's usually something built into the OCR software to keep the lines straight. I guess I'd take a page and carefully adjust the perspective and see how the end result looks. Then try again only adjusting the rotation. If there's not a noticeable difference, just go with the rotation.
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
kempelen
Posts: 18
Joined: 07 Feb 2012, 18:19
E-book readers owned: -
Number of books owned: 1000
Contact:

Re: BookScanWizard 2.0

Post by kempelen »

Thank you for the detailed answer about rotation!

Something strange happened with the coordinates, when I copy my previous coordinates the selection box goes below the image.
When I create a new box at right place, it gives negative coordinates.

Original: Perspective = 943,703, 3139,703, 3095,3880, 964,3885
Now: Perspective = 1138,-3488, 3252,-3425, 3179,-246, 1066,-327

See: http://c64.rulez.org/~lion/bookscanner/ ... ndates.jpg

When the program is in this weird stage, it generates Auto RGB: Levels = 100,100, 100,100, 100,100

And suddenly it goes right after a while, but when I start again, it's broken again. With new project file it works OK.
(I've uploaded the project configuration to the same dir above.)

Level values are still not good in 2.0.2a (neither if starting new project): Levels = 8,2,74,9, 9,67,1, 7,5,47,1

I think hardcoded English format will be safer than trying to support locale formats later.

And I think something made the program much slower. E.g. I press scrollbar below the current location to scroll it down, when ready I press it upwards and that takes about 6 seconds to scroll up. ? This was faster I think. (Core i5) Also 6 seconds to "cut" the coorindates. And I made the text-cursor under "Rotate" commands only, nothing more.

For gray, I've uploaded the original image to same dir: (6MB) http://c64.rulez.org/~lion/bookscanner/bsw/24.JPG
Do you think that may be caused by the huge black area?

For little improvement: if the right click menu supports disabled items, I think it would be better to disable unavailable items than remove them, so users will see it will be there in some condition. (e.g. Auto Levels RGB seems to require a rectangle selection).
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: BookScanWizard 2.0

Post by steve1066d »

Are you using the webstart version? Are you using the 32 bit or 64 bit version?
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
kempelen
Posts: 18
Joined: 07 Feb 2012, 18:19
E-book readers owned: -
Number of books owned: 1000
Contact:

Re: BookScanWizard 2.0

Post by kempelen »

I tried both webstart32 and webstart64, same results.

For easier access to documentation can you please add a link from normal Wiki to your Mediawiki? It's difficult to find the right wiki.

http://sourceforge.net/p/bookscanwizard/wiki/Home/

And links to the BSW guides (in this forum) would be also very useful.
Or please grant me editor access, like to glsoop previously. :-) (kempelen on SF too)
Post Reply