Hi all,
I'm in the process of building my scanner, so I haven't completed any projects yet, but I have made a couple of PDFs using photos from the Internet Archive.
My question is about how one would modify the text layer of a PDF. Google doesn't seem to be very helpful about it.
I'm using Tesseract with hOCR via ocrmypdf and the results are good, but need a few corrections here and there.
It'd be nice if there were a tool that let you see the text layer of a PDF and edit it on the fly. Anyone know of a program (that runs on Linux, preferably free) that can do something like this?
How to edit the text layer of a PDF?
Moderator: peterZ
Re: How to edit the text layer of a PDF?
Hi
I even made somes experiments abouts hows the OCR of some software identifies the words of some pictures。it is not 100% successful。 sorry。
I even made somes experiments abouts hows the OCR of some software identifies the words of some pictures。it is not 100% successful。 sorry。
-
- Posts: 134
- Joined: 21 Sep 2016, 10:51
- E-book readers owned: Tolino Shine
- Country: Germany
- Location: Frankfurt/Main, Germany
Re: How to edit the text layer of a PDF?
I know that version 14 of ABBYY Fine Reader can do this, and of course Adobe Acrobat Pro, starting with a higher level beyond version 8. I have Acrobat Pro 8, and this can do OCR of an image PDF, but provides no means to edit the recognized text. I know that later versions can do that, but don't know with which level on that capability is provided.
Re: How to edit the text layer of a PDF?
I may be sending you on a wild goose chase but this may have some leads:
https://github.com/manisandro/gImageReader/
I haven't looked at it for some time but I recall a fairly recent version allowed input image pdf to be output with a text layer instead of ocr-only and (maybe) the recognized text could also be edited - whether that applies also to a text layer as opposed to text-only, I don't know. Happy exploring!
https://github.com/manisandro/gImageReader/
I haven't looked at it for some time but I recall a fairly recent version allowed input image pdf to be output with a text layer instead of ocr-only and (maybe) the recognized text could also be edited - whether that applies also to a text layer as opposed to text-only, I don't know. Happy exploring!