Page 5 of 8

Re: Bindery

Posted: 05 Jan 2012, 02:51
by Anonymous2
Sadly, I cannot get PDF binding to work after two weeks of tinkering. I can generate the JBIG2 and JPEG2000 components of the individual pages just fine, but I can't figure out how to merge them into layers. If only pdfbeads was written in Python...

Re: Bindery

Posted: 11 Jan 2012, 10:01
by Lazy_Kent
Have you looked at jbig2enc project? Maybe this can help.
https://github.com/agl/jbig2enc/blob/master/pdf.py

Re: Bindery

Posted: 12 Jan 2012, 13:05
by Anonymous2
I rewrote that file a few months ago and incorporated it into Bindery (it was a bit messy and I wanted to see how it worked). The PDFs were tiny, but they were all completely bitonal (even the images).

PDFs support image layers and I was able to separate the text from the images and convert the text to JBIG2 and the images to JPEG2000. This resulted in really tiny source images, but I just can't figure out how to make a layered PDF containing them. I was able to make a PDF with just the text regions, but I can't stick the colored images in as well.

I'm looking at the source of pdfbeads for help (as it does exactly what I need), but it's a bit complicated to translate into Python. When I have more free time I might give it another go.

If you're interested, my PDF binding module is located here: https://github.com/Blender3D/Bindery/bl ... ers/pdf.py

Re: Bindery

Posted: 12 Jan 2012, 16:30
by Lazy_Kent
I suggest to contact Alexey Kryukov (amkryukov _at_ gmail _dot_ com), the author of PDFBeads.

Re: Bindery

Posted: 21 Jan 2012, 00:36
by Anonymous2
Sorry for the delay. I contacted Alexey and he gave me some very good information. Speaking Russian is a bonus after all!

I'm in the process of converting Bindery to Python3 ATM and will re-start PDF binding when I have more time.

Re: Bindery

Posted: 26 Jan 2012, 04:33
by Anonymous2
As there aren't any applications that allow editing of OCR data for DjVu, I thought I'd implement that into Bindery as well.

For now, only the bounding boxes of the words are visualized. Implementing more complex interactions (for now just in-place editing) will be easy:
OCR editing, in-the-works
OCR editing, in-the-works
Bindery_023.png (98.89 KiB) Viewed 11348 times
A huge thanks to strider1551 for making djvubind so modular.

Re: Bindery

Posted: 26 Jan 2012, 09:11
by Lazy_Kent
Anonymous2 wrote:As there aren't any applications that allow editing of OCR data for DjVu.
Take a look at djvusmooth.
http://jwilk.net/software/djvusmooth

Re: Bindery

Posted: 26 Jan 2012, 15:57
by Anonymous2
It took a while to setup, but djvusmooth does have the ability to edit boxing information and metadata.

I looked over this person's other software and the collection is quite impressive. I'm definitely going to look at those too.

Thanks for the link.

Re: Bindery

Posted: 29 Feb 2012, 16:02
by Anonymous2
I've given up on PDF binding and will just create a wrapper around pdfbeads, as it takes care of PDF binding very well.

Hopefully I'll release a somewhat-final version of Bindery within the next few weeks.

Re: Bindery

Posted: 02 Mar 2012, 16:32
by Anonymous2
PDF binding works! I'll probably release a Windows version of Bindery some time this week with a bundled executable of pdfbeads. The only thing you'd need to install is Tesseract, but that's optional.