How to crop and deskew pages using only free software?
Moderator: peterZ
-
- Posts: 6
- Joined: 12 Jun 2019, 11:55
- Number of books owned: 6000
- Country: Germany
How to crop and deskew pages using only free software?
My book scanner (see here) creates images like these:
Is there any way to cut them out automatically and possibly even deskew them, so that the OCR (tesseract) works better?
My method now, which is simple, stupid but kinda works, is to create the RGB-average of the first, say, 20 lines of the images (assuming that there's a border around the book itself), and then go down line by line, taking the average, and if it is like at least 5% or something higher in one line (assuming paper is mostly white), then the above gets cut.
I do the same then from left to right, right to left and bottom to top.
The results are mostly Ok (especially when creating images with a better camera than the example image here), but they're not perfect.
How to do that without putting any manual work into this, once the scanning process started, while getting good (or, preferably, very good) results, using only free software?
Is there anything like that available?
It would be of great help for me!
Is there any way to cut them out automatically and possibly even deskew them, so that the OCR (tesseract) works better?
My method now, which is simple, stupid but kinda works, is to create the RGB-average of the first, say, 20 lines of the images (assuming that there's a border around the book itself), and then go down line by line, taking the average, and if it is like at least 5% or something higher in one line (assuming paper is mostly white), then the above gets cut.
I do the same then from left to right, right to left and bottom to top.
The results are mostly Ok (especially when creating images with a better camera than the example image here), but they're not perfect.
How to do that without putting any manual work into this, once the scanning process started, while getting good (or, preferably, very good) results, using only free software?
Is there anything like that available?
It would be of great help for me!
-
- Posts: 139
- Joined: 18 Dec 2016, 17:13
- E-book readers owned: Calibre, FBReader
- Number of books owned: 7000
- Country: USA
Re: How to crop and deskew pages using only free software?
I don't know about software to do it, but could you zoom in tighter so that you mostly get the pages, and not so much of the surrounding mechanism? I think my OCR (ABBY Finereader 14) could handle that.
Bill Gill
Bill Gill
-
- Posts: 63
- Joined: 22 Dec 2016, 06:07
- E-book readers owned: Tolino, Kindle
- Number of books owned: 600
- Country: Poland
Re: How to crop and deskew pages using only free software?
Scan Tailor Advanced has the 'crop' function. After 'Splitting' and 'Deskew' stages there is 'Page Box' in select content. There is rectangular selection (orange lines) that should cover the cropped area. You should apply 'Page Box' manual mode and 'Content Box' auto. Apply it separately to even and odd pages by selecting 'this page and the following every other page'. Do not omit the 'Apply page box' checkbox in 'Select content' submenu.
-
- Posts: 63
- Joined: 22 Dec 2016, 06:07
- E-book readers owned: Tolino, Kindle
- Number of books owned: 600
- Country: Poland
Re: How to crop and deskew pages using only free software?
some screenshots in order to make my above comments more clear.
Of course the book must be in the same position at each picture. Especially the gutter (where is the line of split of pages) should not move. Otherwise the crop function would be not effective.
Moreover it may be useful to apply 'Dewarping' in the final stage of the Scan Tailor workflow in order to straighten lines of text.
Of course the book must be in the same position at each picture. Especially the gutter (where is the line of split of pages) should not move. Otherwise the crop function would be not effective.
Moreover it may be useful to apply 'Dewarping' in the final stage of the Scan Tailor workflow in order to straighten lines of text.
-
- Posts: 6
- Joined: 12 Jun 2019, 11:55
- Number of books owned: 6000
- Country: Germany
Re: How to crop and deskew pages using only free software?
Thanks very much, BillGill and zbgns.
Of course, zooming into the picture helps a lot, but I just forgot about that, sorry and thanks. But the problem is that the machine is not absolutely stationary and moves a bit while scanning (the machine itself and also the book). So this is only a partial solution.
And to zbgns:
I've already looked into scantailor and scantailor-cli, but I never got it to be fully automated. The content-detection of scantailor would be good enough, but I never got it to work without any manual labour to be put in the process. This is what I want to avoid: putting manual work into it.
I believe it should be possible to do that, since the white page should really somehow be "easy" to detect (not really easy to code it, but easy enough for enough-sophisticated algorithms).
Do you know if there's any way to do that, e.g. with Scantailor-Cli, via the command line/a script?
Thanks again!
Of course, zooming into the picture helps a lot, but I just forgot about that, sorry and thanks. But the problem is that the machine is not absolutely stationary and moves a bit while scanning (the machine itself and also the book). So this is only a partial solution.
And to zbgns:
I've already looked into scantailor and scantailor-cli, but I never got it to be fully automated. The content-detection of scantailor would be good enough, but I never got it to work without any manual labour to be put in the process. This is what I want to avoid: putting manual work into it.
I believe it should be possible to do that, since the white page should really somehow be "easy" to detect (not really easy to code it, but easy enough for enough-sophisticated algorithms).
Do you know if there's any way to do that, e.g. with Scantailor-Cli, via the command line/a script?
Thanks again!
-
- Posts: 63
- Joined: 22 Dec 2016, 06:07
- E-book readers owned: Tolino, Kindle
- Number of books owned: 600
- Country: Poland
Re: How to crop and deskew pages using only free software?
I do not expect that Scan Tailor is able to provide perfect or very good results without some manual corrections. It is universal tool but it seems you would need something more customized and tailor made. Maybe something based on OpenCV?
-
- Posts: 6
- Joined: 12 Jun 2019, 11:55
- Number of books owned: 6000
- Country: Germany
Re: How to crop and deskew pages using only free software?
I'm not a C-developer and I have severe problems getting OpenCV-examples I find to work regarding similar issues, starting at how to include the headers (like,
Code: Select all
#include "opencv2/imgproc/imgproc_c.h"
Is there an "idiot-proof" way of doing this? Or did anybody here do something similar? It would be of great help to me.
-
- Posts: 6
- Joined: 12 Jun 2019, 11:55
- Number of books owned: 6000
- Country: Germany
Re: How to crop and deskew pages using only free software?
Update: I've gotten very good results with this: https://towardsdatascience.com/object-d ... cb4d86f606
This works nearly perfectly.
Code: Select all
from PIL import Image
import os
def crop_image (img, imgnew, box_points):
print(str(box_points[0]) + ", " + str(box_points[1]) + ", " + str(box_points[2]) + ", " + str(box_points[3]))
imageObject = Image.open(image)
cropped = imageObject.crop((box_points[0], box_points[1], box_points[2], box_points[3]))
cropped.save(imgnew)
image = "image.jpg"
outputimg = "imagefinal.jpg"
# x left top x right bottom
from imageai.Detection import ObjectDetection
execution_path = os.getcwd()
detector = ObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath(os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
detector.loadModel()
detections = detector.detectObjectsFromImage(input_image=os.path.join(execution_path , image), output_image_path=os.path.join(execution_path , "imagenew.jpg"))
for eachObject in detections:
name = eachObject["name"]
if(name == "book"):
box_points = eachObject["box_points"]
percentage_probability = eachObject["percentage_probability"]
print(name + ": " + str(percentage_probability))
crop_image(image, outputimg, box_points)
-
- Posts: 1
- Joined: 15 Sep 2020, 04:52
- E-book readers owned: Kindle PW 1st Gen
- Number of books owned: 86
- Country: USA
Re: How to crop and deskew pages using only free software?
NormanBuchScanner wrote: ↑28 Jul 2019, 07:19 Update: I've gotten very good results with this: https://towardsdatascience.com/object-d ... cb4d86f606
This works nearly perfectly.Code: Select all
from PIL import Image import os def crop_image (img, imgnew, box_points): print(str(box_points[0]) + ", " + str(box_points[1]) + ", " + str(box_points[2]) + ", " + str(box_points[3])) imageObject = Image.open(image) cropped = imageObject.crop((box_points[0], box_points[1], box_points[2], box_points[3])) cropped.save(imgnew) image = "image.jpg" outputimg = "imagefinal.jpg" # x left top x right bottom from imageai.Detection import ObjectDetection execution_path = os.getcwd() detector = ObjectDetection() detector.setModelTypeAsRetinaNet() detector.setModelPath(os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5")) detector.loadModel() detections = detector.detectObjectsFromImage(input_image=os.path.join(execution_path , image), output_image_path=os.path.join(execution_path , "imagenew.jpg")) for eachObject in detections: name = eachObject["name"] if(name == "book"): box_points = eachObject["box_points"] percentage_probability = eachObject["percentage_probability"] print(name + ": " + str(percentage_probability)) crop_image(image, outputimg, box_points)
Very interesting, thank you! I'm attempting to fix some very poorly formatted pdfs but I'm trying to go for an automated approach as well. This blogpost was very useful and am currently training a ~200 image model I quickly put together, but even that seems to be taking a very long time on my machine. I'm going to attempt it on Code Collab and see if I can finish before I get gpu privileges revoked. I'll update with my model if I can get it to work, cheers.