I don't know what Internet Archive is doing, but my guess would be that they're building a PDF from an existing text file. Since you wouldn't need to open such a file in OCR software (the text would already be there), my guess is probably incorrect.
I use Adobe Acrobat version 9 or higher when I want compression. It has a "Clearscan" option, which creates a custom font and vectorizes the text with it, at the same time that it does OCR. For my purposes, the OCR is acceptable -- I'm not using a text-to-voice application to read to me, just doing an occasionally search on the generated text. Clearscan also produces smoother looking characters than the original Scan Tailor output, and works better with the Scan Tailor output than it does with the original JPEGs.