My process using BookScanWizard

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Post Reply
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

My process using BookScanWizard

Post by steve1066d »

I thought I'd publish my process, and how I think is the most effective way to use BookScanWizard:

I've got a pretty much standard "New Standard" scanner, with the cameras running chdk. I've got the two camera's connected to a powered usb hub, with a switch spliced into the 5v power line to the hub. I've seen using a hub mentioned before on this site, but not too often.. its a lot easier to do that then messing with batteries and splicing usb cords together. Here's the chdk script I use:

Code: Select all

@title Remote button
:loop
press "shoot_half"
do
until (is_key "remote")
click "shoot_full"
release "shoot_half"
goto "loop"
The speed of that script is equivalent to the SDM "Fast" mode... It takes the picture pretty much immediately when the remote is pressed. (I used CHDK because I was having problems getting SDM to override the focus).

I've got a process where I can scan multiple books at a time and then use BSW to split the images into a set of books, using barcodes (actually QR-codes).

On the cradle of my scanner, I've got barcodes on both sides that indicate "end of book". That way when I'm scanning along and I come to the end of the book, I take a picture of the barcode as the last picture of the book. It then knows that it should start a new book.
Or if I know ahead of time the titles of the books I'm going to scan, I can create barcodes that encode the name of the book, which I scan before the start of the book.

I also have barcodes to do the following:
  • Redo the page set: This indicates that the previous 2 pages will be be replaced with the set that follows.
  • Adjust perspective: This will rotate and fix keystone distortion, using the technique Rob pioneered with his qrpc utility. However, since it requires carefully lining the barcode up straight with the book, I usually skip this stip and do the correction interactively in the program.
  • Change to black and white or grayscale. If I want to set the page to black and white or grayscale I can use these cards. While these can also be entered interactively, sometimes it is easier to handle this while the book is open.
  • Flag page: I use this to just put a note in the configuration file to indicate I should go back and check that scan. I might use it if I think I might have missed or duplicated a page.
  • Gray card: The next page that follows is a gray card, which I can use to adjust the exposure. (Normally I don't bother with that, unless I'm trying to scan an art book or something like that).
  • Skip this page: I can put this on a blank page or other page I don't want in my finished output.
After scanning I copy the images to two directories, say c:\source\l for the left side of the pages, and c:\source\r for the right side. Next I run these two bsw commands:

Code: Select all

bsw -mergelr c:\source c:\merged
bsw -split -scale .25 c:\merged c:\sorted
That will merge the left & the right files together., and save them to the c:\merged directory. Next it will scan the files in c:\merged for barcodes, and move them to the c:\sorted directory, split into books. If I use the "Title" barcodes they are stored with the book name. If the break was just indicated by the "End book" barcode, it gets a generic name based on the date and time of the scan. At this stage the barcode information is written to barcodes.csv, which is a standard comma separated file that contains all of the found barcodes.

After this stop I'm ready for the interactive portion of the processing. You can start by using the wizard within BSW to create a startup configuration. However, once you get used to it you'll find it is easier to come up with your own template and use that as a starting point instead.

So I copy my template in the sorted book directory. Here's the template I use:

Code: Select all

# BSW Configuration Template
#
# The source directory
LoadImages = .

# The Destination directory
SetDestination = c:\abbyy\input

# Sets the final DPI and compression
SetTiffOptions = 300 NONE

# Estimate the source DPI from the focal length setting
EstimateDPI = 5.8,135, 23.2,482

# Configure the left pages]
Pages = left
Rotate = -90

# Configure the right pages]
Pages = right
Rotate = 90

Pages = all
Barcodes = 

########################################################################
### Insert commands to fix keystone, color, etc.
########################################################################


Pages=left


Pages = right


########################################################################

Pages=all
#This imporves the contrast a bit than the above levels command.
Levels = 0, 90

# Rescale the image to match the final DPI

ScaleToDPI=
PostCommand = c:\java\BookScanWizard\util\bsw.ahk
PostCommand = move c:\abbyy\output\output.pdf "%parentAsName%.pdf"
PostCommand = del /q c:\abbyy\input
Note: If you right click the Barcodes: configuration line, it will replace the Barcodes: with the commands defined via barcodes.

Next I select a left page selection in the template, then choose a left page in the middle of the book and fix the perspective. For this step you just need four corners that will be straightened into a rectangle. I save that then define the crop by clicking two points to indicate the crop, then save that as a crop operation.

I do the same thing for the right side, but when I get to the crop, I cut and paste the configuration from the left page, then hold down the shift key and move it to the same spot on the right side. That way the crop sizes are the same as the right & left.

I then make sure that the checkboxes are turned off, and examine a few other pages to ensure that the perspective and cropping works for all the pages. I'll either adjust the crops, or maybe create a crop for a subset of the pages, if necessary.

Next I use the right click option in the viewer, "autolevels", which will figure out a level command to improve the contrast of the image. Usually I use a separate correction for the left & right images because in my scanner at least, my lights don't illuminate both pages quite the same.

Once everything is looking right, I'll click at the end of the script and examine a few pages. Once I'm satisfied everything is looking good I'll save the configuration file. If I just have one book to do I'll press the "submit" button to run the batch.. otherwise I'll continue on with the following:

Repeat the above steps for the other books I've scanned.

I then use a script that runs this command for each book:

bsw -batch

That will run the command and save the tiff files. I've set it up so the script will also call an AutoHotKey script to create a PDF file using ABBYY's FineReader professional.

Here is a copy of that script:

Code: Select all

SetTitleMatchMode 2 
run C:\Program Files\ABBYY FineReader 10\FineReader
WinWaitActive ABBYY FineReader
Send ^t
Send !n
WinWaitActive, A_Convert, Close
Send {Enter}
WinWaitNotActive, A_Convert, Close
Send !fx
WinWaitActive,, Do you Want to save
Send n
return
It all may seem a bit complicated at first, but actually, for less than 5 minutes of my time per book I can go from raw images to dekeystoned, cropped, color adjusted, and OCR'd pdf file.
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: My process using BookScanWizard

Post by daniel_reetz »

I've stickied this thread as it is extraordinarily useful.
ibr4him
Posts: 102
Joined: 18 Oct 2010, 10:36

Re: My process using BookScanWizard

Post by ibr4him »

Great info, thanks, but BookScanWizard seems a bit difficult to master, would love a video tutorial if possible.
Koen
Posts: 1
Joined: 30 Aug 2011, 16:31
E-book readers owned: NEO
Number of books owned: 0

Re: My process using BookScanWizard

Post by Koen »

I can't figure out were to run the following commands:
bsw -mergelr c:\source c:\merged
bsw -split -scale .25 c:\merged c:\sorted

Thanks!
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: My process using BookScanWizard

Post by steve1066d »

I've already replied to this via a private message, but here's the answer again if anyone else is having the same issue:


First, you need to create the launch script. If you are using the web start version, from the application choose Tools, then Create launch script. For a location I'd recommend c:\windows\bsw.cmd

Next, you need to get to the command prompt. Assuming you are running windows, enter Start, Run, then enter "cmd" (without the quotes).

From there you should be able to run the command line version.

Hope that helps, if you still have questions, let me know.

Steve
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: My process using BookScanWizard

Post by daniel_reetz »

Just a little moderator's note here to everyone - not specifically about this thread - it is ALWAYS best to post your questions out in the open (unless they are about a moderation problem, or you're concerned about something on the forums). Some days I get 10 to 20 emails/PMs about DIY Book Scanner stuff. The thing is - if I answer them here in public, everyone can benefit. If I answer them in private, only one person can benefit. The choice is obvious. I almost always ask that people post their question in the main forum so we can all help and we can all benefit. The same goes for people like Steve or Tulon - it's almost always better to post the question in public. Ideally we can keep their time free for programming and other things they love, instead of less-fun things like tech support! Thanks!
stephenneujahr
Posts: 1
Joined: 27 Jan 2012, 14:40
Number of books owned: 0

Re: My process using BookScanWizard

Post by stephenneujahr »

Excuse my lack of computer knowledge.

When I type "bsw" (without quotes) in the command prompt it says this

C:\Users\nonserviam>bsw
Exception in thread "main" java.lang.NoClassDefFoundError: javax/media/jai/JAI
at net.sourceforge.bookscanwizard.BSW.<clinit>(BSW.java:92)
Caused by: java.lang.ClassNotFoundException: javax.media.jai.JAI
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 1 more
Could not find the main class: net.sourceforge.bookscanwizard.BSW. Program will
exit.

and my C:\merged file is empty. I'm not really sure what other things I can try.
droidguy
Posts: 2
Joined: 13 Sep 2012, 09:24
E-book readers owned: Archos 7 OC,Nook Color, Android G2
Number of books owned: 0
Country: Poland

Re: My process using BookScanWizard

Post by droidguy »

I have two questions regarding your excellent tutorial:

1. If I scan a book with just one camera, that is all odd pages first and the the even ones, can BSW help me to merge the two sets of images into one? And what command shall I use?

2. Abby FineReader seems to be not free. What shall I use instead if I 'm not willing to spend money on it?
droidguy
Posts: 2
Joined: 13 Sep 2012, 09:24
E-book readers owned: Archos 7 OC,Nook Color, Android G2
Number of books owned: 0
Country: Poland

Re: My process using BookScanWizard

Post by droidguy »

@stephenneujahr, it's been couple months since your posted your problem, but anyway. It seems that JAI (Java advanced Images) has not been installed or at least installed correctly. When you were installing BSW, have you accepted the istallation of the JAI as well? You can check it by opening Java Control Panel, General tab, Temp Internet Files -> View, right-click BSW and Show JNLP File. Among other stuff, there should be smth like this there:

Code: Select all

  <resources>
    <java java-vm-args="-Xmx1024M" version="1.6+"/>
    <jar href="http://bookscanwizard.sourceforge.net/run/BookScanWizard.jar" download="eager" main="true"/>
    <extension href="http://bookscanwizard.sourceforge.net/webstart/release/jai-1.1.3.jnlp"/>
    <extension href="http://bookscanwizard.sourceforge.net/webstart/release/jai-imageio-1.1.jnlp"/>
  </resources>
If you don't see the jai extensions, delete BSW using the same panel and run BSW via Java Webstart again from the button on: http://bookscanwizard.sourceforge.net/run/
Post Reply