I noticed that the problem with the dewraping in Scan Tailor Experimental is when you have blank pages in the project. Crash is when STE tries to apply the dewarping to such blank pages. So if blank pages are excluded from dewarping (no geometric distortion corrections applied or only deskew) there should be no crash.antwoorden wrote: ↑15 Jun 2018, 03:20 But there's one functionality I'd advise to exclude: moving the dewarping from the final to the third stage. At least there should be an ability to keep it in stage 3. The reason for this, is that the dewarping is often crashing, with many of my scanned books. When this happens in the 3rd stage, it crashes my whole project.
Thus I do not think that the problem you encountered with Scan Tailor Experimental is actually contradiction to move the dewarping function to the third stage. I really opt for this change in Scan Tailor Advanced.
The unique functionality of Scan Tailor is ability to prepare set of scans or photos for further OCR-ing, but after processing they are also ready for combining in a pdf or djvu file. However no additional functions like extraction of pictures from pdfs or binding output pictures into pdf would make Scan Tailor better in this main function.
By the way, as it comes to extracting images from pdf files, the best tool I know is pdfimages from the poppler set. It allows for real extraction of images whereas Adobe Acrobat (at least Pro DC 2015 version I have access to) renders them.
Combining of images into a pdf file may be done using jbig2 encoder (bi-tonal images) or img2pdf (color ones). The text layer from tesseract may be easily added using pdftk. So it is quite easy to have two layer pdf file with efficiently compressed images (jbig2 for bi-tonal pictures of text and jpeg2000 for color images) and text layer underneath. It takes more time and effort when you compare this method with using FineReader or Acrobat but results are comparable or even better as you have better control over each step of processing.