When enough of us have scanners up and running, we should figure out how to do book scanning races. A timed event where you take a book, scan it, OCR, then clean it up.
Start with a book list. (Ask someone from the Gutenberg project.) Everyone gets a set of books to scan drawn from this list and a stopwatch. He does his assigned books and submits his time for each book. Each book shall be scanned a total of 3 times by different contestants. Judges compare the dupes against each other to find errors and assess points.
This exercise would provide a bottom-line evaluation of hardware and software and operator workflow. We're pursuing different hardware and software options. Though we think we know what works better, self-deception tends to get squashed by race results. (Ask me about Pinewood derby racing someday.)
Winners would get bragging rights and (upon full disclosure of their winning formulas) a nice prize. And the participants (and only participants) would each get copies of the etexts. If we can get a judge to agree this is a fair use, we can include copyrighted works. And we could donate etexts to whatever charities provide books for the blind.
But the big win for everyone would be that we'd learn (and confirm) the relative merits of everything we've been discussing here.
