Bindery

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

Sadly, I cannot get PDF binding to work after two weeks of tinkering. I can generate the JBIG2 and JPEG2000 components of the individual pages just fine, but I can't figure out how to merge them into layers. If only pdfbeads was written in Python...
Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Bindery

Post by Lazy_Kent »

Have you looked at jbig2enc project? Maybe this can help.
https://github.com/agl/jbig2enc/blob/master/pdf.py
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

I rewrote that file a few months ago and incorporated it into Bindery (it was a bit messy and I wanted to see how it worked). The PDFs were tiny, but they were all completely bitonal (even the images).

PDFs support image layers and I was able to separate the text from the images and convert the text to JBIG2 and the images to JPEG2000. This resulted in really tiny source images, but I just can't figure out how to make a layered PDF containing them. I was able to make a PDF with just the text regions, but I can't stick the colored images in as well.

I'm looking at the source of pdfbeads for help (as it does exactly what I need), but it's a bit complicated to translate into Python. When I have more free time I might give it another go.

If you're interested, my PDF binding module is located here: https://github.com/Blender3D/Bindery/bl ... ers/pdf.py
Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Bindery

Post by Lazy_Kent »

I suggest to contact Alexey Kryukov (amkryukov _at_ gmail _dot_ com), the author of PDFBeads.
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

Sorry for the delay. I contacted Alexey and he gave me some very good information. Speaking Russian is a bonus after all!

I'm in the process of converting Bindery to Python3 ATM and will re-start PDF binding when I have more time.
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

As there aren't any applications that allow editing of OCR data for DjVu, I thought I'd implement that into Bindery as well.

For now, only the bounding boxes of the words are visualized. Implementing more complex interactions (for now just in-place editing) will be easy:
OCR editing, in-the-works
OCR editing, in-the-works
Bindery_023.png (98.89 KiB) Viewed 11341 times
A huge thanks to strider1551 for making djvubind so modular.
Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Bindery

Post by Lazy_Kent »

Anonymous2 wrote:As there aren't any applications that allow editing of OCR data for DjVu.
Take a look at djvusmooth.
http://jwilk.net/software/djvusmooth
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

It took a while to setup, but djvusmooth does have the ability to edit boxing information and metadata.

I looked over this person's other software and the collection is quite impressive. I'm definitely going to look at those too.

Thanks for the link.
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

I've given up on PDF binding and will just create a wrapper around pdfbeads, as it takes care of PDF binding very well.

Hopefully I'll release a somewhat-final version of Bindery within the next few weeks.
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Bindery

Post by Anonymous2 »

PDF binding works! I'll probably release a Windows version of Bindery some time this week with a bundled executable of pdfbeads. The only thing you'd need to install is Tesseract, but that's optional.
Post Reply