Introducing djvubind for djvu file creation

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Re: Introducing djvubind for djvu file creation

Postby strider1551 » 14 Aug 2010, 22:29

Tulon wrote:Even though that's not optimal, the result should still look fine.

Hmm... yes. My concern was how it might affect compression efficiency, but the more I think about it the less I think it would even make a difference. I still plan to test things out on a full color image, just to double check.

Thank you for mentioning csepdjvu... probably would have spent half the night wondering how to merge two separate encodings into one page without that hint. I've worked out a script that works on a test image (attached). Long story short is a decrease from 43.7 kB (tif -> ppm -> cpaldjvu) to 11.6 kB for the test image.
Code: Select all
#! /bin/bash

convert -opaque black sample.tif sample_graphics.tif
convert +opaque black sample.tif sample_textual.tif
cjb2 sample_textual.tif temp_textual.djvu
ddjvu -format=rle -v temp_textual.djvu temp_textual.rle
convert sample_graphics.tif temp_graphics.ppm
cat temp_textual.rle temp_graphics.ppm > temp.mix
csepdjvu -vv temp.mix out.djvu
rm sample_* temp*


So it looks like I can get that feature done in the very near future. Just gotta relearn how to make debian packages before I go back to working on the code.
Attachments
sample.tif
(112.43 KiB) Downloaded 378 times
User avatar
strider1551
 
Posts: 126
Joined: 01 Mar 2010, 11:39
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Postby Misty » 16 Aug 2010, 09:33

strider1551 wrote:If we're talking a file created from scanned images, a djvu file will be significantly smaller in size. Djvu was made specifically for scanned images and makes use of jb2 compression, whereas the best compression for a pdf is Group4.

...

Edit:
I forgot to mention that several months ago I heard of a new compression for pdf's that is very similar to jb2 and would produce similar file sizes. For the life of me I can't remember the name of it. I do remember that there was a big issue of it being encumbered by patents. If it does take off, it would probably be a few years before it makes it into pdf reader software, and who knows when it would be accessible in the open source world if there are patent questions.


PDF actually does support JBIG2 since PDF format 1.4. DjVu's JB2 is based on JBIG2; I believe AT&T based it on a pre-standard version of JBIG2, before it was standardized in 2000. There's just a lack of open-source PDF utilities that support encoding JBIG2 PDFs. The only good one I've found is a Python script based on jbig2enc. If patents are an issue, I would assume that DjVu's JB2 is also patent-encumbered because they are based on the same format.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
Misty
 
Posts: 481
Joined: 06 Nov 2009, 12:20
Location: Frozen Wasteland

Re: Introducing djvubind for djvu file creation

Postby Tulon » 16 Aug 2010, 12:38

It turns out jbig2enc does support lossy encoding. The heavy lifting is provided by the Leptonica library in this case. Here is a relevant link: http://www.leptonica.com/jbig2.html
I wouldn't worry about patents. Some relevant ones are listed in the end of the JBIG2 specification, but many of them are old and some can be worked around. Yet another reason I wouldn't care about patents is that I believe only Hello World type of programs don't violate any patents.

Anyway, assembling PDFs with JBIG2 compression seems more complex that doing the same with DJVU, as there doesn't seem to be an equivalent to cdjvusep.
When Scan Tailor asks you to enter DPIs manually, never enter arbitrary values. The video tutorial shows how to estimate the real DPI.
Tulon
 
Posts: 536
Joined: 03 Oct 2009, 06:13
Location: London, UK

Re: Introducing djvubind for djvu file creation

Postby dingodog » 16 Aug 2010, 12:55

on my system (Puppy Linux)

I use

- Scantailor in order to clean and split scanned pages
- Jbig2enc to reduce furtherly the size of scanned images and then pdf.py (with python) to join together all images encoded with jbig2

it's very simple
User avatar
dingodog
 
Posts: 81
Joined: 22 Jul 2010, 18:19
Location: on the net

Re: Introducing djvubind for djvu file creation

Postby strider1551 » 19 Aug 2010, 22:20

The next version is available for download (both source and ebuild, debian package should be up by the weekend). I make note of it here because this version supports the mixed mode offered by scantailor. If the mixed mode image contains only black/white the image will go through minidjvu, otherwise the text version goes to cjb2 and before being combined with the graphical version by csepdjvu. Better compression than before in either case. Oh, and the stupid problem of calling python3 is gone, so you can just use "djvubind".

The next few days are somewhat busy, and then I leave for a week long silent retreat. Feedback/bugs/complaints still welcomed, but any response will obviously be delayed until I get back if I don't get to it before the weekend.
User avatar
strider1551
 
Posts: 126
Joined: 01 Mar 2010, 11:39
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Postby daniel_reetz » 20 Aug 2010, 08:43

blogged! thanks for the cool utility.
User avatar
daniel_reetz
 
Posts: 2487
Joined: 03 Jun 2009, 13:56

Re: Introducing djvubind for djvu file creation

Postby bkrpr » 26 Aug 2010, 13:45

also blogged and, for what it is worth, I'm in love with scantailor+djvubind. This is some fine work.
User avatar
bkrpr
 
Posts: 8
Joined: 29 Sep 2009, 14:40

Re: Introducing djvubind for djvu file creation

Postby daniel_reetz » 26 Aug 2010, 17:16

BKRPR, are you still working on this?
User avatar
daniel_reetz
 
Posts: 2487
Joined: 03 Jun 2009, 13:56

Re: Introducing djvubind for djvu file creation

Postby bkrpr » 26 Aug 2010, 17:49

daniel_reetz wrote:BKRPR, are you still working on this?


No longer actively under development. The lead developer had twins right after the last version released. Oddly, development slowed :)

At this point, Scantailor is what we were moving towards anyway so there is not much incentive to pull developer time our way. If anyone else is interested in re-using the auto crop algorithms from our codebase, they might convert well into phatch actions since both codebases are python. I can see a use in that, but I can't see much of a use to having two Scantailors out there right now.
User avatar
bkrpr
 
Posts: 8
Joined: 29 Sep 2009, 14:40

Re: Introducing djvubind for djvu file creation

Postby daniel_reetz » 26 Aug 2010, 17:51

but I can't see much of a use to having two Scantailors out there right now.


Agreed, but as noted here Tulon is still pretty much a one-man show. It would be great to get him some development support, especially as the Scan Tailor userbase continues to expand.
User avatar
daniel_reetz
 
Posts: 2487
Joined: 03 Jun 2009, 13:56

PreviousNext

Return to Programs, Software releases, and more.

Who is online

Users browsing this forum: No registered users and 0 guests