Bindery
Moderator: peterZ
- strider1551
- Posts: 126
- Joined: 01 Mar 2010, 11:39
- Number of books owned: 0
- Location: Ohio, USA
Re: Bindery
Well... djvubind needs to run tesseract twice. The first time produces just a normal plain-text file. The second time produces a files were each line is contains a character and its positional information (the corners of the box it's in). Unfortunately, the second file does not contain spaces or line breaks, hence why I need the first file to know when one word ends and another begins.
Sometimes, though, these two files don't match. For example, the textual data sees "the" but the positional data sees "tho". This is more likely when the image is very poor or if there is a lot of non-character drawings and whatnot. Djvubind tolerates a minimal amount of differences and does it's best to sync the two files, but if the disparity is too great it will throw out that warning. (Side note: I believe tesseract-3.0.0 added a hocr output file that would solve this whole mess completely, but that release was only within a couple months)
In the past there have been cases where the disparity was djvubind's fault because of various bugs - so don't think I'm blaming everything on tesseract. In fact, I haven't seen that warning in a long time, so maybe tesseract is free of blame. UTF-8 was a good place to start thinking, especially since you're working with python2 whereas I'm working with python3.
Does djvubind also produce this warning, or only bindery?
Sometimes, though, these two files don't match. For example, the textual data sees "the" but the positional data sees "tho". This is more likely when the image is very poor or if there is a lot of non-character drawings and whatnot. Djvubind tolerates a minimal amount of differences and does it's best to sync the two files, but if the disparity is too great it will throw out that warning. (Side note: I believe tesseract-3.0.0 added a hocr output file that would solve this whole mess completely, but that release was only within a couple months)
In the past there have been cases where the disparity was djvubind's fault because of various bugs - so don't think I'm blaming everything on tesseract. In fact, I haven't seen that warning in a long time, so maybe tesseract is free of blame. UTF-8 was a good place to start thinking, especially since you're working with python2 whereas I'm working with python3.
Does djvubind also produce this warning, or only bindery?
Re: Bindery
Thanks, it was the encoding. I just had to import the encoding module and encode the boxing information, but it now works. Did you get Bindery to work? I updated it yet again, and now I'm adding a renaming feature (I'm postponing PDF until someone actually needs it). Here's an updated screenshot:
Re: Bindery
Just throwing this out for anyone with a Mac. Could someone confirm/deny that this script works for Mac OS? I feel a bit in the dark saying that it's "Cross Platform", yet I can't really get it to work on anything but Windows and Linux (but Mac isn't too far off).
Re: Bindery
I've slowly been working on Bindery. Mainly UI changes, but I've been cleaning up the code quite a lot, so no major improvements. Maybe I will include PDF binding later this week (or today), but I'm still playing with it.
Has anybody made binaries out of scripts with py2exe or cx_freeze for any Python application? The ones I make never work on any machine except for mine...
Has anybody made binaries out of scripts with py2exe or cx_freeze for any Python application? The ones I make never work on any machine except for mine...
Re: Bindery
This is a small request--when someone has time and patience, and is willing to explain how to get this up and running and doing its thing on Windows, I'd be grateful.
Re: Bindery
Sorry for the neglect
I rarely use Windows, but I can see what I can do. Here's a list of preliminary software packages to install (preferably in this order):
Lastly, (this is from memory, so correct me if I'm wrong), create a shortcut on your desktop (RIght Click -> New -> Application Shortcut) and make it run this command: "C:\Program Files\Python 2.7\python.exe" "C:\Program Files\Bindery\main.py"
If this fails, just open up Command Prompt (Windows Key + R and then enter in "cmd") and run the above command.
Sorry for the lack of instructions. I'll update the site to include step-by-step installation instructions, but when I have access to a Windows computer, I might create an all-in-one downloader and installer executable with VB.net.
I rarely use Windows, but I can see what I can do. Here's a list of preliminary software packages to install (preferably in this order):
- Python: x32 x64. I would change the install directory to "C:\Program Files\Python 2.7\", as it installs to "C:\Python27\" by default. Yuck.
- PIL: architecture independent
- PyQt: x32 x64
- DjVuLibre: architecture independent
- ImageMagick: x32 x64
Lastly, (this is from memory, so correct me if I'm wrong), create a shortcut on your desktop (RIght Click -> New -> Application Shortcut) and make it run this command: "C:\Program Files\Python 2.7\python.exe" "C:\Program Files\Bindery\main.py"
If this fails, just open up Command Prompt (Windows Key + R and then enter in "cmd") and run the above command.
Sorry for the lack of instructions. I'll update the site to include step-by-step installation instructions, but when I have access to a Windows computer, I might create an all-in-one downloader and installer executable with VB.net.
Last edited by Anonymous1 on 12 Feb 2011, 22:12, edited 3 times in total.
Re: Bindery
Thanks Anonymous. I think you forgot to mention this:
http://effbot.org/downloads/PIL-1.1.7.win32-py2.7.exe
Thank you very much also for your involvement in ST.
http://effbot.org/downloads/PIL-1.1.7.win32-py2.7.exe
Thank you very much also for your involvement in ST.
Re: Bindery
Thanks, I'll add it in. Did you get it working okay (as some other Windows people might have had some problems with the setup)?
Re: Bindery
In fact I haven't really tried it. But when I did after your message I got this :
Also, you have to add an Environment variable PATH pointing to DjvuLibre folder for cjb2 to work.
Code: Select all
err: utils.execute(): command exited with bad status.
cmd = identify -ping "D:\karamazov1\karamaz0010.tif"
exit status = 1
Exception KeyError: KeyError(1804,) in <module 'threading' from 'C:\Program File
s\Python 2.7\lib\threading.pyc'> ignored
QObject::killTimer: timers cannot be stopped from another thread