PDFMaker 0.3 - help beta test!

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

bnorling

Re: PDFMaker 0.3 - help beta test!

Post by bnorling »

When I run pdfmaker I get this output:

D:\ocrstuff\source\out>pdfmaker -d pdfout
done
Running jbig2enc... done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.

It runs in less than a second and then there's no output in any of the sub directories that pdfmaker creates.

Any Ideas?
User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: PDFMaker 0.3 - help beta test!

Post by JonEP »

I saw Misty was participating in the forum again after a hiatus -- any interest in pursuing this again, Misty?

:)
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.3 - help beta test!

Post by Misty »

Hehe, yes, I am back! Changing jobs is always time-consuming.

Is there still a demand? PDFBeads seems to provide a lot of the functionality that I was designing into PDFMaker. However, if you think there's a reason to pick up PDFMaker again, I'll definitely look at doing so.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: PDFMaker 0.3 - help beta test!

Post by JonEP »

I'll check out PDF Beads -- didn't know it existed! Will report back on perceived needs after that. And thanks.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.3 - help beta test!

Post by Misty »

Definitely give it a try. It looks great, though I haven't used it myself. If I were going to tackle this need again I'd probably be writing in Ruby anyway, so if there is missing functionality and there isn't a fatal workflow incompatibility, I'll probably look at adding that into PDFBeads rather than writing a new tool.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: PDFMaker 0.3 - help beta test!

Post by JonEP »

Hmmmm.
I don't think I'm ready to head down the path of trying to get Ruby working on my Windows machine. Frankly, I don't know what Ruby and Gems are -- another coding language, I am guessing. Who knew the DIY Scanner was a portal onto multiple universes!
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.3 - help beta test!

Post by Misty »

Yes, Ruby is a programming language. Admittedly I haven't tried it on Windows, but it is incredibly easy to get working on Mac.

Basically, there are two parts you need to worry about - the Ruby interpreter, and RubyGems. The interpreter just handles running the Ruby programs, kind of like runtime libraries for Windows apps - once you have it installed, you're set. RubyGems is a package manager for programs written in Ruby. It takes care of downloading and installing programs for you. So, for example, all you need to type is 'gem install X' (where X is the name of the program you want to use), and RubyGems will download the program and install it for you so it's ready to be used. For PDFBeads, you need to install three gems: pdfbeads, hpricot and rmagick (which has its own Windows FAQ and one-click installer here: http://rmagick.rubyforge.org/install-faq.html#win).

The official ruby-lang.org site links to a single installer for Windows that sets up a Ruby and RubyGems environment for you: http://rubyinstaller.org/

Sorry this isn't more convenient. Microsoft should include Ruby like Apple does. ;)
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
lupocos

Re: PDFMaker 0.3 - help beta test!

Post by lupocos »

I managed to install pdfbeads on Windows, with jbig2 encoder enabled. It took me long time to understand how to install all the required software (from Ruby itself, to ImageMagick, rmagick, the jbig2 encoder), but now at least pdfbeads works on Windows too: monochrome tiff files are compressed with jbig2 and a 100-page PDF takes on average only 1.2 MB.
The only problem is this strange error, which doesn't seem to affect the output:

Code: Select all

TIFFReadDirectory: TIFFstream: Failed to read directory at offset 0.
Error in findTiffCompression: tif not opened
Do I simply ignore it?

And another question to @ Misty: has pdfbeads replaced PDFmaker? I haven't tried the latter, and as far as I understood it should be similar to pdfbeads. I wonder if PDFmaker would be easier to install for a newbie?

Thanks!

Cosimo
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFMaker 0.3 - help beta test!

Post by Misty »

I believe that's a bug in RubyPDF's compile of jbig2enc. It means nothing - it complains, but it reads the file anyway. It just spams your terminal.

Yes, pdfbeads has replaced PDFMaker. It's possible that PDFMaker might be easier to install for a newbie, but it's also much more primitive. I don't plan to develop it anymore unless there are deficiencies that really need to be addressed in pdfbeads. If there are, I would either pursue a Ruby rewrite of PDFMaker, or contribute to pdfbeads.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
loyukfai
Posts: 43
Joined: 24 Jan 2011, 02:37

Re: PDFMaker 0.3 - help beta test!

Post by loyukfai »

Wonder what's the major/minor differences between PDFMaker and PDFBeads...?

Edit: To answer my own Q, capped the helps below...

Code: Select all

>pdfbeads --help
Warning: the hpricot extension is not available. I'll not be able
        to create hidden text layer from hOCR files.
Usage: pdfbeads [options] [files to process] > out.pdf

PDF file properties:
  -C, --toc TOCFILE        Build PDF outline dictionary from a text file
  -L, --labels LSPEC       Specify page labels for user-friendly page numbering
  -M, --meta METAFILE      Take metadata for the PDF file from a text file
  -P, --pagelayout LAYOUT  Specify the default page layout for PDF viewer, where

                           LAYOUT is `SinglePage', `OneColumn', `TwoColumnLeft'
                           `TwoColumnRight', `TwoPageLeft', or `TwoPageRight'

Image encoding and compression options:
  -f, --force-update       Always write subsidiary image files even if a file
                           with the same name is already found on the disk
  -m FORMAT                Compression method for foreground text mask in PDF
      --mask-compression   pages (JBIG2 or G4). JBIG2 is used by default, unless

                           the encoder is not available
  -p, --pages-per-dict NUM Generate one shared JBIG2 dictionary per NUM pages.
                           This option is only applied when JBIG2 compression
                           is used. Default value is 15
  -r DPI                   Set resolution for foreground mask images to the
      --force-resolution   specified value (in pixels per inch). Note that the
                           image is not actually resampled.
  -t, --threshold VAL      Set binarization threshold for mixed images. Valid
                           values are between 1 and 255. 1 is used by default,
                           as the input files are assumed to be preprocessed
                           with ScanTailor (http://scantailor.sourceforge.net)
  -x, --max-colors NUM     If pdfbeads finds an indexed file with NUM or
                           less colors, then it will attempt to split it into
                           several bitonal images and encode them all into the
                           PDF page mask. Otherwise the file is treated just
                           like a normal greyscale or color image. Default
                           value is 4

The following options are only applied when pdfbeads attempts to split
a mixed source image into text mask and background layer:
  -b FORMAT                Compression method for background images. Acceptable
      --bg-compression     values are JP2|JPX|JPEG2000, JPG|JPEG or LOSSLESS.
                           JP2 is used by default, unless this format is not
                           supported by the available version of ImageMagick
  -B, --bg-resolution DPI  Set resolution for background images (300dpi default)

  -g, --grayscale          When separating text from background, always convert
                           background images to grayscale

General options:
  -o, --output FILE        Print output to a file instead of STDERR
  -h, --help               Show this message

Code: Select all

pdfmaker

Usage: pdfmaker [-picdpi] [-gamma] [-ps] -d [DIRECTORY]
Creates multiple-layer PDFs from Scan Tailor output. Specify the Scan Tailor out
directory with -d [DIRECTORY].

Optional arguments:
-picdpi X       Specify a DPI for images in the final PDF. Default is 100
-quality        Specify JPEG compression quality for images. Default is 85
-ps             Use Photoshop instead of ImageMagick to produce image layers

PDFMaker 0.1.1 is (c) 2010 Misty De Meo, released under the GPLv3. See license.t
for more information. PDFMaker was created at the County of Brant Public
Library to produce PDFs for http://images.ourontario.ca/brant/
This script incorporates code by Terry Moore and Tim Fehlman.

PDFMaker relies on the following software:
jbig2enc (c) Adam Langley, binary by Steven Lee of RubyPDF
ImageMagick (c) ImageMagick Studio LLC
pdftk (c) Sid Stewart done
Running jbig2enc... done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.
The major difference, besides more granular controls offered by PDFBeads, is that PDFMaker has no OCR support (yet).

Cheers.
Post Reply