PDFMaker 0.3 - help beta test!
Re: PDFMaker 0.3 - help beta test!
When I run pdfmaker I get this output:
D:\ocrstuff\source\out>pdfmaker -d pdfout
done
Running jbig2enc... done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.
It runs in less than a second and then there's no output in any of the sub directories that pdfmaker creates.
Any Ideas?
D:\ocrstuff\source\out>pdfmaker -d pdfout
done
Running jbig2enc... done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.
It runs in less than a second and then there's no output in any of the sub directories that pdfmaker creates.
Any Ideas?
Re: PDFMaker 0.3 - help beta test!
I saw Misty was participating in the forum again after a hiatus -- any interest in pursuing this again, Misty?


Re: PDFMaker 0.3 - help beta test!
Hehe, yes, I am back! Changing jobs is always time-consuming.
Is there still a demand? PDFBeads seems to provide a lot of the functionality that I was designing into PDFMaker. However, if you think there's a reason to pick up PDFMaker again, I'll definitely look at doing so.
Is there still a demand? PDFBeads seems to provide a lot of the functionality that I was designing into PDFMaker. However, if you think there's a reason to pick up PDFMaker again, I'll definitely look at doing so.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFMaker 0.3 - help beta test!
I'll check out PDF Beads -- didn't know it existed! Will report back on perceived needs after that. And thanks.
Re: PDFMaker 0.3 - help beta test!
Definitely give it a try. It looks great, though I haven't used it myself. If I were going to tackle this need again I'd probably be writing in Ruby anyway, so if there is missing functionality and there isn't a fatal workflow incompatibility, I'll probably look at adding that into PDFBeads rather than writing a new tool.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFMaker 0.3 - help beta test!
Hmmmm.
I don't think I'm ready to head down the path of trying to get Ruby working on my Windows machine. Frankly, I don't know what Ruby and Gems are -- another coding language, I am guessing. Who knew the DIY Scanner was a portal onto multiple universes!
I don't think I'm ready to head down the path of trying to get Ruby working on my Windows machine. Frankly, I don't know what Ruby and Gems are -- another coding language, I am guessing. Who knew the DIY Scanner was a portal onto multiple universes!
Re: PDFMaker 0.3 - help beta test!
Yes, Ruby is a programming language. Admittedly I haven't tried it on Windows, but it is incredibly easy to get working on Mac.
Basically, there are two parts you need to worry about - the Ruby interpreter, and RubyGems. The interpreter just handles running the Ruby programs, kind of like runtime libraries for Windows apps - once you have it installed, you're set. RubyGems is a package manager for programs written in Ruby. It takes care of downloading and installing programs for you. So, for example, all you need to type is 'gem install X' (where X is the name of the program you want to use), and RubyGems will download the program and install it for you so it's ready to be used. For PDFBeads, you need to install three gems: pdfbeads, hpricot and rmagick (which has its own Windows FAQ and one-click installer here: http://rmagick.rubyforge.org/install-faq.html#win).
The official ruby-lang.org site links to a single installer for Windows that sets up a Ruby and RubyGems environment for you: http://rubyinstaller.org/
Sorry this isn't more convenient. Microsoft should include Ruby like Apple does.
Basically, there are two parts you need to worry about - the Ruby interpreter, and RubyGems. The interpreter just handles running the Ruby programs, kind of like runtime libraries for Windows apps - once you have it installed, you're set. RubyGems is a package manager for programs written in Ruby. It takes care of downloading and installing programs for you. So, for example, all you need to type is 'gem install X' (where X is the name of the program you want to use), and RubyGems will download the program and install it for you so it's ready to be used. For PDFBeads, you need to install three gems: pdfbeads, hpricot and rmagick (which has its own Windows FAQ and one-click installer here: http://rmagick.rubyforge.org/install-faq.html#win).
The official ruby-lang.org site links to a single installer for Windows that sets up a Ruby and RubyGems environment for you: http://rubyinstaller.org/
Sorry this isn't more convenient. Microsoft should include Ruby like Apple does.

The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFMaker 0.3 - help beta test!
I managed to install pdfbeads on Windows, with jbig2 encoder enabled. It took me long time to understand how to install all the required software (from Ruby itself, to ImageMagick, rmagick, the jbig2 encoder), but now at least pdfbeads works on Windows too: monochrome tiff files are compressed with jbig2 and a 100-page PDF takes on average only 1.2 MB.
The only problem is this strange error, which doesn't seem to affect the output:
Do I simply ignore it?
And another question to @ Misty: has pdfbeads replaced PDFmaker? I haven't tried the latter, and as far as I understood it should be similar to pdfbeads. I wonder if PDFmaker would be easier to install for a newbie?
Thanks!
Cosimo
The only problem is this strange error, which doesn't seem to affect the output:
Code: Select all
TIFFReadDirectory: TIFFstream: Failed to read directory at offset 0.
Error in findTiffCompression: tif not opened
And another question to @ Misty: has pdfbeads replaced PDFmaker? I haven't tried the latter, and as far as I understood it should be similar to pdfbeads. I wonder if PDFmaker would be easier to install for a newbie?
Thanks!
Cosimo
Re: PDFMaker 0.3 - help beta test!
I believe that's a bug in RubyPDF's compile of jbig2enc. It means nothing - it complains, but it reads the file anyway. It just spams your terminal.
Yes, pdfbeads has replaced PDFMaker. It's possible that PDFMaker might be easier to install for a newbie, but it's also much more primitive. I don't plan to develop it anymore unless there are deficiencies that really need to be addressed in pdfbeads. If there are, I would either pursue a Ruby rewrite of PDFMaker, or contribute to pdfbeads.
Yes, pdfbeads has replaced PDFMaker. It's possible that PDFMaker might be easier to install for a newbie, but it's also much more primitive. I don't plan to develop it anymore unless there are deficiencies that really need to be addressed in pdfbeads. If there are, I would either pursue a Ruby rewrite of PDFMaker, or contribute to pdfbeads.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: PDFMaker 0.3 - help beta test!
Wonder what's the major/minor differences between PDFMaker and PDFBeads...?
Edit: To answer my own Q, capped the helps below...
The major difference, besides more granular controls offered by PDFBeads, is that PDFMaker has no OCR support (yet).
Cheers.
Edit: To answer my own Q, capped the helps below...
Code: Select all
>pdfbeads --help
Warning: the hpricot extension is not available. I'll not be able
to create hidden text layer from hOCR files.
Usage: pdfbeads [options] [files to process] > out.pdf
PDF file properties:
-C, --toc TOCFILE Build PDF outline dictionary from a text file
-L, --labels LSPEC Specify page labels for user-friendly page numbering
-M, --meta METAFILE Take metadata for the PDF file from a text file
-P, --pagelayout LAYOUT Specify the default page layout for PDF viewer, where
LAYOUT is `SinglePage', `OneColumn', `TwoColumnLeft'
`TwoColumnRight', `TwoPageLeft', or `TwoPageRight'
Image encoding and compression options:
-f, --force-update Always write subsidiary image files even if a file
with the same name is already found on the disk
-m FORMAT Compression method for foreground text mask in PDF
--mask-compression pages (JBIG2 or G4). JBIG2 is used by default, unless
the encoder is not available
-p, --pages-per-dict NUM Generate one shared JBIG2 dictionary per NUM pages.
This option is only applied when JBIG2 compression
is used. Default value is 15
-r DPI Set resolution for foreground mask images to the
--force-resolution specified value (in pixels per inch). Note that the
image is not actually resampled.
-t, --threshold VAL Set binarization threshold for mixed images. Valid
values are between 1 and 255. 1 is used by default,
as the input files are assumed to be preprocessed
with ScanTailor (http://scantailor.sourceforge.net)
-x, --max-colors NUM If pdfbeads finds an indexed file with NUM or
less colors, then it will attempt to split it into
several bitonal images and encode them all into the
PDF page mask. Otherwise the file is treated just
like a normal greyscale or color image. Default
value is 4
The following options are only applied when pdfbeads attempts to split
a mixed source image into text mask and background layer:
-b FORMAT Compression method for background images. Acceptable
--bg-compression values are JP2|JPX|JPEG2000, JPG|JPEG or LOSSLESS.
JP2 is used by default, unless this format is not
supported by the available version of ImageMagick
-B, --bg-resolution DPI Set resolution for background images (300dpi default)
-g, --grayscale When separating text from background, always convert
background images to grayscale
General options:
-o, --output FILE Print output to a file instead of STDERR
-h, --help Show this message
Code: Select all
pdfmaker
Usage: pdfmaker [-picdpi] [-gamma] [-ps] -d [DIRECTORY]
Creates multiple-layer PDFs from Scan Tailor output. Specify the Scan Tailor out
directory with -d [DIRECTORY].
Optional arguments:
-picdpi X Specify a DPI for images in the final PDF. Default is 100
-quality Specify JPEG compression quality for images. Default is 85
-ps Use Photoshop instead of ImageMagick to produce image layers
PDFMaker 0.1.1 is (c) 2010 Misty De Meo, released under the GPLv3. See license.t
for more information. PDFMaker was created at the County of Brant Public
Library to produce PDFs for http://images.ourontario.ca/brant/
This script incorporates code by Terry Moore and Tim Fehlman.
PDFMaker relies on the following software:
jbig2enc (c) Adam Langley, binary by Steven Lee of RubyPDF
ImageMagick (c) ImageMagick Studio LLC
pdftk (c) Sid Stewart done
Running jbig2enc... done
Running ImageMagick... done
Running pdftk... done!
All done! Finished processing files.
Cheers.