Most Efficient Workflow / Process Available Currently

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Re: Most Efficient Workflow / Process Available Currently

Postby Leolopez » 29 Oct 2010, 22:52

Hi, to reduce processing time, you can open multiple sessions scantailor, take advantage of a dual or quad core processor, and thus reduce processing time for each book.
I process books average 1000 pages, what I do is divide the book into 3 parts.
I put snapshot of the pc doing the job.
I hope this helps in your work.

Bye
Attachments
output - 3 at time.JPG
output - 3 at time.JPG (153.81 KiB) Viewed 2795 times
content in one.JPG
content in one.JPG (166.45 KiB) Viewed 2795 times
part1-2-3.JPG
part1-2-3.JPG (154.68 KiB) Viewed 2795 times
Leolopez
 
Posts: 2
Joined: 08 Oct 2010, 09:40

Re: Most Efficient Workflow / Process Available Currently

Postby emmerick » 11 Jan 2011, 07:01

mellow-yellow wrote:Since writing my post above, I have experimented, corrected, and improved this proposal substantially. Feedback welcome! :)

NOTE: A=Attended ("your" time), U=Unattended ("CPU" time)

Fastest (300 pg book)
1. Scan with SDM using S_FAST* (8 min A)
2. Transfer L and R images to PC (2 min A)
3. Rename L (001.jpg, 003.jpg, etc.) and R (002.jpg, 004.jpg) with IrfanView in Batch (1 min A)
4. Combine results into a single folder, move to ABBYY Hot Folder** and convert to PDFs (1 min A, 20 min. U)
5. Acrobat Standard - Combine Files - to create a single PDF (1 min A, 2 min U)
Total: 13 minutes (A) or 35 minutes (A+U)
Advantages: Speed (a 300-page, OCR'd book in 13 min!), Less time waiting for and returning to the PC (#4 to #5)
Disadvantages: Poor contrast (JJM's correct), no cropping*** (rig visible, IrfanView can crop but you'll add 1 min. A and 6 min. U)


Better Quality (300 pg book)
1. Scan with SDM using S_FAST* (8 min A)
2. Transfer L and R images to PC (2 min A)
3. Rename L (001.jpg, 003.jpg, etc.) and R (002.jpg, 004.jpg) with IrfanView in Batch (1 min A)
4. ScanTailor L then ScanTailor R: steps #1-#4 (5 min A, 3 min U)
5. ScanTailor Cropping*** Fix (http://diybookscanner.org/forum/viewtop ... =466#p4791) (2 min. A)
6. ScanTailor L then ScanTailor R: steps #4-#6 with Mixed selected (5 min A, 7 min U)
7. Copy L and R "out" folder to ABBYY Hot Folder** for conversion to PDFs (1 min A, 20 min. U)
8. Acrobat Standard - Combine Files - to create a single PDF (1 min A, 2 min U)
Total: 25 min (A), 57 min (A+U)
Advantages: White backgrounds on black text, good colors and contrast, cropped images
Disadvantages: Cumbersome, 62% slower ( (57-35)/35 *100 = 62.857), More time waiting for and returning to the PC (#4 to #5, #6 to #7, #7 to #8)

* S_FAST: viewtopic.php?f=3&t=358&p=5528#p5528
** ABBYY Hot Folder Settings included: PDF/A, Mixed Raster Content (MRC), text under image
*** JPEGCrops was unstable (crashes, slow, probably due to the hundreds of 12MP color images) for me on both Windows 7 and XP SP3.



The secret to jpegCrops not to crash is to put a maximum of 100 to 100 images. really is much easier to work with the scan tailor in select contend
Iam Sorry for MY English, i am use GOOGLE translate. :)))))))
emmerick
 
Posts: 30
Joined: 06 Jan 2011, 14:35
Location: Rio de Janeiro/Brazil

Re: Most Efficient Workflow / Process Available Currently

Postby Anonymous1 » 12 Jan 2011, 01:19

I would discourage running so many Scan Tailor instances. The output size for each batch will be different, and I doubt you'll get much of a boost in performance (I think ST utilizes multiple QThreads, but I'm not entirely sure).
Anonymous1
 

Re: Most Efficient Workflow / Process Available Currently

Postby dtic » 12 Jan 2012, 18:19

Anonymous1: ST only uses one CPU core. I think my script for using two ST instances (only) for the final processing step avoids the problem you describe. I at least haven't noticed differences in page size for the two output halves. Check it out here:
viewtopic.php?f=8&t=1249
dtic
 
Posts: 204
Joined: 06 Mar 2010, 18:03

Re: Most Efficient Workflow / Process Available Currently

Postby mellow-yellow » 25 Mar 2012, 18:50

Since I haven't updated this post lately, and because I've been working on the software side of this workflow a *lot* lately, it now seems to me that the fastest and easiest way to convert your images to e-books (i.e. PDF using Scantailor and ABBYY FineReader) is ... (sorry, shameless plug follows) ... to use my open-source DIY E-book Creator, especially if you have Microsoft Windows XP, Windows Vista, or Windows 7.

http://diybookscanner.org/forum/viewforum.php?f=23

Please consider helping us improve that software though, as it needs bug fixes and flexibility improvements (e.g. more output formats, open source OCR engine like Tesseract).
User avatar
mellow-yellow
 
Posts: 46
Joined: 28 Jun 2010, 13:33
Location: Portland, OR, USA

Re: Most Efficient Workflow / Process Available Currently

Postby mellow-yellow » 21 Feb 2013, 19:18

Just a quick update on our workflow: I've posted a roughly 7-minute "book to e-book" video showing my current (fastest) workflow:



Enjoy!
User avatar
mellow-yellow
 
Posts: 46
Joined: 28 Jun 2010, 13:33
Location: Portland, OR, USA

Previous

Return to Tutorials/How-To's

Who is online

Users browsing this forum: No registered users and 1 guest