Misty wrote:I was going to suggest JBIG2enc too. It's much, much more efficient than Group4 or similar compression. My text-only pages are typically in the region of 6-20KB each, not 400KB. It's not quite as efficient as DjVu, which uses a very similar compression scheme, but it's definitely what I'd call "good enough."
However, despite benchmarks publically available, I personally found certain cases where jbig2 compression overpass the djvu compression
Misty wrote:Unfortunately, the current release version of JBIG2enc discards all DPI information from the input TIFF. That's my only complaint right now. That's been fixed in the Git version, but I've been having trouble getting it to compile in Windows.
I compiled linux version (including patch for resolution fix) and I prefer using jbig2enc in linux
I burned an ISO of PUPPY LINUX LIVE CD
-
http://dokupuppylinux.co.cc/then installed python 2.5 (pet package)
-
http://dokupuppylinux.co.cc/programs:pythonand jbig2enc (if I remember, the version available here I compiled already with patch)
-
http://dokupuppylinux.co.cc/programs:encodersanyway, soon I will add, on same page, the jbig2enc version furtherly patched; this new patch (by
akryukov, adds new switch
-P, an ability to set number of pages for dictionary (because for long books having an unique dictionary can made very slow browsing pages), a modified version of pdf.py is also needed (I commented line 27 in order to make working without PIL)
-d --duplicate-line-removal: use TPGD in generic region coder
-p --pdf: produce PDF ready data
-P <number> --pages-per-dict <number>: pages per dictionary (default 15)
-s --symbol-mode: use text region, not generic coder
-t <threshold>: set classification threshold for symbol coder (def: 0.85)
-T <bw threshold>: set 1 bpp threshold (def: 188)
-r --refine: use refinement (requires -s: lossless)
-O <outfile>: dump thresholded image as PNG
-2: upsample 2x before thresholding
-4: upsample 4x before thresholding
-S: remove images from mixed input and save separately
-j --jpeg-output: write images from mixed input as JPEG
-v: be verbose