I have a large number of Japanese paperbacks that I want to OCR, roughly 4x6 and 5x7 inches. I had everything I needed except a platen and a design, and after browsing these forums for a while, I came up with both. Because my camera has manual focus and exposure controls (and remote-control software!), I wanted a design that kept the platen at a constant distance from the camera. There's still some tinkering to do, but I think it's going to work nicely.
Here (3.6MB JPEG) is the raw output from the first test shot, done with just room lighting.
Materials:
- Minolta Dimage A2 8MP camera, Dimage Capture remote-control software, and AC-11 power adapter.
- Kaiser copystand.
- Speedotron Black Line 1205CX power pack, plus two each of 202VF color-corrected flash head with 7" reflector, 10-degree grid insert, and 8-foot light stand.
- Wein wireless flash trigger and slave.
- Abbyy Finereader Pro 9.0.
- 32x32-post Lego baseplate and a whole bunch of Legos in various sizes and shapes.
- 7.75x9.25-inch (roughly 25x30-post) sheet of 1/4-inch Acrylite GP for the platen.
- a bunch of pennies.
- several sheets of matte black construction paper.
- two sheets of cardstock.
- several sheets of packing foam.
- tape.
Back view, with a spacer column to position 4x6 books consistently:
In use; construction paper is taped to the vertical riser to eliminate glare, and another sheet placed behind the page being scanned to significantly reduce bleed-through. The pennies add enough weight to the assembly so I can press the book up into the platen:
Camera support (note the camera doesn't need to be in Macro mode):
Cardstock-and-foam cradle to press the hand-held book up into the platen consistently.
Cleaned-up detail from the test scan linked above:
Without doing any cleanup on the test scan, Finereader was able to OCR it with only two single-character errors, plus about a dozen garbage characters added to short lines where it interpreted the noise in the low-contrast JPEG as ASCII. The camera can shoot RAW, the flash rig will provide crisp, even lighting, and doing a little bit of pre-processing before OCR will significantly improve the recognition rate and get rid of the noise. I know from experience that a 300dpi scan is adequate for Japanese in Finereader, and this setup will get ~600dpi for the 4x6 books and ~450 for the 5x7 ones, so it should work out nicely.
I tested the flash rig before I built the scanner, so I know it will fill the platen with crisp, even, glare-free light, but it's powerful enough to light a large room, and at its minimum setting (1/4 power, single channel, for an effective 37 watt-seconds per head), the reflectors have to be nearly six feet away from the platen to deliver only f/8 at ISO 64, with an exposure time of 1/500th seconds.
I'll be doing some real scans tonight. The target books are the murder mystery To Die in the Rain (4x6, page 5 was the test scan), Japanese Train Lunches (4x6, full-color photos), and My Youth with Ultraman (5x7, with B&W photos). I'll post some samples once they're done.
-j