Well... djvubind needs to run tesseract twice. The first time produces just a normal plain-text file. The second time produces a files were each line is contains a character and its positional information (the corners of the box it's in). Unfortunately, the second file does not contain spaces or line breaks, hence why I need the first file to know when one word ends and another begins.
Sometimes, though, these two files don't match. For example, the textual data sees "the" but the positional data sees "tho". This is more likely when the image is very poor or if there is a lot of non-character drawings and whatnot. Djvubind tolerates a minimal amount of differences and does it's best to sync the two files, but if the disparity is too great it will throw out that warning. (Side note: I believe tesseract-3.0.0 added a hocr output file that would solve this whole mess completely, but that release was only within a couple months)
In the past there have been cases where the disparity was djvubind's fault because of various bugs - so don't think I'm blaming everything on tesseract. In fact, I haven't seen that warning in a long time, so maybe tesseract is free of blame. UTF-8 was a good place to start thinking, especially since you're working with python2 whereas I'm working with python3.
Does djvubind also produce this warning, or only bindery?
