by jgreely » 07 Apr 2011, 02:55
I'm J Greely, network/sysadmin/QA/janitor/whatever guy at a Silicon Valley startup and a serious amateur photographer, active on the Internet since the Eighties. The walls of my house are covered with well-stuffed bookshelves, but for the most part, those aren't the books I want to scan.
I've been studying Japanese for several years, and one of the most frustrating roadblocks is when you run out of "student editions" and "graded readers", and try to start reading things written for a native audience. Children's books and comics can keep you going for a while, and there are tools like Rikai to help with web sites, but eventually you find yourself with a book you really want to read, and resign yourself to carrying around printed and electronic dictionaries and struggling through each page.
Back when Foothill College still had a group reading class, I put together some simple scripts to add phonetic guides (furigana) to texts and create matching vocabulary lists, making nicely-formatted PDF ebooks for the class. I typed in a few short stories (painfully slow), used a flatbed scanner and Abbyy FineReader Pro to OCR a few more, and found some easy-to-read out-of-copyright texts on the Aozora Bunko site.
Two months ago, I dusted off those scripts and got serious about automating the process, gluing together a number of Open Source projects into a system that can convert a novel-length text file into a PDF ebook with a matching per-page vocabulary list that's precisely matched to my reading level, in about 30 seconds. I've read more real Japanese in the past two months than in the previous two years, and it's all material that I actually want to read.
Unfortunately, while I've been able to find "unofficial" ebooks of some of the novels I own, most of them are low-resolution scans that are iffy for OCR (200-250 DPI heavily-compressed JPEG), and the online community is heavily biased toward the most popular recent anime-style series. If I want to read any of the random histories, mysteries, biographies, travel books, SF, etc that I've bought, I need to scan them myself.
I had looked at commercial book scanners in the past, and bookmarked the original Instructables page, but it wasn't until my vacation trip to Japan was canceled by recent events that I found myself with enough time and motivation to go through these forums in detail and work out the details of a scanner that I could build without much effort or expense. An $8 platen from TAP Plastics and an hour with a bucket of Legos was all it took to start making high-quality scans of paperbacks.
Well, that and the thousands of dollars of studio photography gear that I already owned, which included an excellent camera, a sturdy copystand, and a commercial flash system that's ridiculously overpowered for the job. It's always nice when one hobby subsidizes another.
My current scanner is only capable of handling paperbacks up to a bit over 5x7 inches, but I still have enough Legos to scale the design up for a 10x14 platen. The 8MP camera may not prove adequate at that size, but fortunately I have a few more dollars invested in a 24MP DSLR and lenses...
And someday, if I get really ambitious, I have a nearly complete archive of the old tabloid-sized Upper & Lower Case typography magazine; decades of type-geekery!
-j