Alternative Software Workflow

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Re: Alternative Software Workflow

Postby jradi » 26 Jul 2009, 16:13

I still got the error. It's strange, it doesn't happen to all photos, sometimes it might only happen to 2 or 3 in a batch of 600. It's the most annoying thing about my workflow, is that I have to wait until abbyy is ocr'ing (which can take quite a long time) before I leave the process alone. Until then, there's a chance that several of the pages aren't accepted and I'm forced to recrop the photos.

The strange thing is that once a jpg is corrupted, if that's what happens, then no amount of tweaking and resaving the image will recover it. The only solution is to go back to the original jpg and crop/save from there.

Maybe if pagebuilder will be better...
User avatar
jradi
 
Posts: 82
Joined: 06 Jun 2009, 21:31
Location: DC - NoVa

Re: Alternative Software Workflow

Postby xylon » 26 Jul 2009, 20:53

i was approaching the problem form the prospective that abbyy was the problem not jpgcrops.
User avatar
xylon
 
Posts: 27
Joined: 01 Jul 2009, 15:29

Re: Alternative Software Workflow

Postby jakegaisser » 22 Mar 2010, 19:49

daniel_reetz wrote:Surya is employing your method now (I hear from him via email much more often than the forum).

I know you've got a method worked out and everything, but PageBuilder will do your first two steps -- crop and rotate. From those JPGs you could use ABBY. All you need is PageBuilder 2 and the Matlab Component Runtime. Apologies if you've already tried it. Just be sure to check the JPG output radio button.


second link is dead, here is one:

Matlab component Runtime installer: http://www.sccn.ucsd.edu/~arno/download ... taller.exe
jakegaisser
 
Posts: 63
Joined: 01 Mar 2010, 17:09

Re: Alternative Software Workflow

Postby daniel_reetz » 22 Mar 2010, 22:04

pagebuilder is presently unmaintained, i wouldn't waste time on it.
User avatar
daniel_reetz
 
Posts: 2490
Joined: 03 Jun 2009, 13:56

Re: Alternative Software Workflow

Postby jakegaisser » 23 Mar 2010, 00:20

What are you currently using then? I have the Left and Right folders full of pictures for a book I scanned and I am not sure what to do from here?

I need to get my images rotated, cropped, and into PDF format with the least amount of effort possible. (it does not need to be OCCR scanned, I am fine with just having the images in a pdf)

edit: Metamorphose is a very powerful file renamer... it is also open source, I have put in a request for a feature so that you could easily rename all the files in one swoop, instead of having to do the left pages, and then the right.
jakegaisser
 
Posts: 63
Joined: 01 Mar 2010, 17:09

Re: Alternative Software Workflow

Postby daniel_reetz » 23 Mar 2010, 07:18

I am using Scan Tailor.
User avatar
daniel_reetz
 
Posts: 2490
Joined: 03 Jun 2009, 13:56

Re: Alternative Software Workflow

Postby spamsickle » 23 Mar 2010, 08:35

jakegaisser wrote:Metamorphose is a very powerful file renamer... it is also open source, I have put in a request for a feature so that you could easily rename all the files in one swoop, instead of having to do the left pages, and then the right.

If you have Perl installed on Windows, you can do what I do. I start each new book in its own directory, within which I create subdirectories L, R, and Both.

Then, I have a little script called DIYmerge.cmd:

Code: Select all
cd L
perl f:\Scripts\Perl\DIYrename0.plx 0 > DIYrename.cmd
call DIYrename.cmd
move *.jpg ..\Both
cd ..\R
perl f:\Scripts\Perl\DIYrename0.plx 1 > DIYrename.cmd
call DIYrename.cmd
move *.jpg ..\Both


The Perl routine itself is pretty simple too:

Code: Select all
# glob an array of all the JPG files
@files = <*.jpg>;

# get starting page number from command line
$page = $ARGV[0];

# print "ren file.jpg page.jpg" for each file in array
foreach $file (@files) {
      print "ren " . $file . " ";
      printf ("%04d", $page);
      print ".jpg" . "\n";
      $page += 2;
}


The only complication arises when you "wrap" the page numbers in your camera. In my Canon, when I pass another 10,000 picture mark, it creates a new directory on the SD card, and starts the new image numbers with 0000, so my L and R directories end up with 9999-names which should come before 0000-names. In this circumstance, I just do my own rename before running DIYmerge:

ren 0* 5*
ren 9* 4*
DIYmerge
spamsickle
 
Posts: 577
Joined: 06 Jun 2009, 23:57

Re: Alternative Software Workflow

Postby jakegaisser » 26 Mar 2010, 21:33

I wrote a script to rename, merge and rotate, it requires imagemagick to be installed.

I have:
D:\Book\L
D:\Book\R

I place this windows batch script into D:\book

makebook.batch:
Code: Select all
set count=0
FOR /r %%A IN (*.jpg) DO CALL :NUMBER %%A
goto :EOF
:NUMBER
IF "%count%"=="0" (
   set count=1
   set odd=1
) ELSE (
   IF NOT "%~p1"=="%PREVDIR%" (
      IF "%odd%"=="1" (
         set count=2
         set odd=0
      ) ELSE (
         set count=1
         set odd=1
      )   
   )
)
set NUM=000%count%
set NUM=%NUM:~-4%
ren "%1" "../%NUM%.JPG"
IF "%odd%"=="1" (
convert %NUM%.JPG -rotate 90 %NUM%.JPG
) ELSE (
convert %NUM%.JPG -rotate 270 %NUM%.JPG
)
set /a count+=2
set PREVDIR=%~p1
goto :EOF
jakegaisser
 
Posts: 63
Joined: 01 Mar 2010, 17:09

Re: Alternative Software Workflow

Postby jakegaisser » 26 Mar 2010, 21:35

I am now trying to see if I can automatate cropping.... so far it does not look like I can... maybe I can find a way to crop all images as a whole for both L & R folder before renaming and rotating.
jakegaisser
 
Posts: 63
Joined: 01 Mar 2010, 17:09

Re: Alternative Software Workflow

Postby spamsickle » 27 Mar 2010, 00:02

I think your way of doing the rename/merge is better than mine, because it doesn't require Perl. Old habits...

I wouldn't bother doing the rotates in pre-processing though, if you're using Scan Tailor. Scan Tailor will rotate the images faster than ImageMagick, with less wear and tear on your hard drive. Before I became aware of Scan Tailor, I was doing exactly what you're trying to do, using ImageMagick to rotate and crop. The "cropping" was hit and miss, because there was a bit of jitter in the scanning process, so I'd typically have to include a bit of slop in the dimensions to make sure I wasn't cropping content. For a while, I was using JPEGcrops to do the cropping without the slop, but now I'm doing all of my content-selection tweaking in Scan Tailor. Cropping as a pre-processing step doesn't really guarantee that Scan Tailor's content selection won't still need to be adjusted, and I'd rather do something once than twice.

I am still using ImageMagick on the back end, to convert Scan Tailor's TIFFs to PDFs:

mogrify -format PDF *.TIFF

then pdftk to merge the individual pages into the final book:

pdftk 0*.pdf cat output finalbookname.pdf
spamsickle
 
Posts: 577
Joined: 06 Jun 2009, 23:57

PreviousNext

Return to Tutorials/How-To's

Who is online

Users browsing this forum: No registered users and 0 guests