Colored Text

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Re: Colored Text

Postby clemd973 » 23 Dec 2010, 01:18

Anonymous wrote:Closed source junkies and their overkill solutions ;)

I do this all the time with just ImageMagick. What you're looking for is a just converting the image into a three color palette, namely black, white, and red. I'll post a script when I get home, but here's something you can look at (just don't scroll up. It never ends): http://www.imagemagick.org/Usage/quantize/#handling.

LOL. I've always been one for overkill!!! :D Scripting seems a bit beyond my understanding...your link was greek to me. Upload a sample of your work if you can. Thanks!
User avatar
clemd973
 
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Colored Text

Postby Anonymous1 » 23 Dec 2010, 16:38

Lol, that's okay. I was playing with it for a while, and found that strider1551's method worked well. I just modified the values to work with your original image, and here's the command:

Code: Select all
convert AdjustedRedLayeredWB.jpg  +dither  -fuzz "30%"  -fill "white"  -opaque "#dfedef"  -modulate "100,150,100"  -fuzz "45%"  -fill "red"  -opaque "red"  -fuzz "50%"  -fill "black"  -opaque "black"  -map ColorMap.png  Test.png


Before:
AdjustedRedLayeredWB.jpg


The color map:
ColorMap.png


After:
Test.png


The image is first made almost purely black, red, and white using strider1551's method. Then, I just apply a color map to it, rounding all colors to only those three. The result is a bit hard to look at, as it's really red, but that can be tweaked.

I can rant against closed source software/payware all day ;)
Anonymous1
 

Re: Colored Text

Postby clemd973 » 24 Dec 2010, 00:11

Anonymous wrote:I can rant against closed source software/payware all day ;)

Good work! However, you might be able to rant against closed source software/payware all day, but the results don't compare...yet. ;) Keep working on it. :D I do like the more vibrant red in your version, though. I'll have to tweak my settings a bit.
User avatar
clemd973
 
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Colored Text

Postby kasslloyd » 24 Dec 2010, 01:32

You also have to keep in mind that when you use acrobat's Clear Scan ocr the text is no longer a graphic but literally text rendered as a font, so the size of the file is practically nil. Only downside to using clear scan is that 600 dpi source is recommended, anything less you start to loose quality and smaller text can be garbled beyond recognition with clear scan if the source dpi isn't high enough.

I myself use acrobat, being a college student I can get it for $100 instead of $500. Haven't found any cheaper and definitely not free alternatives to do all that acrobat can do.
kasslloyd
 
Posts: 41
Joined: 19 Dec 2010, 21:25

Re: Colored Text

Postby strider1551 » 24 Dec 2010, 08:35

clemd973 wrote:you might be able to rant against closed source software/payware all day, but the results don't compare...yet. ;)

Fair enough. Here's an enhanced image that brings out the black text and keeps the red the same shade as the original capture (and at the moment a lossy djvu version comes in at 20kB). The image is the .jpg you posted here on the thread, but was also sent through Scantailor with white margins and equalize illumination turned on.

Code: Select all
#! /bin/bash

set -e -u

# Create a better base image to work withby getting rid of background junk.
convert "${1}" \
        -fuzz 20% -fill white -opaque white \
        _base.tif

# Isolate black and red colors to only the sections of the image where those colors should be.
# Note that we can "loose" the shape of the characters, all we need to do is get red and black
# in the general areas.
convert _base.tif \
        -fuzz 30% -fill black -opaque black \
        -fill white +opaque black \
        -median 3 \
        -blur 10 -fuzz 50% -fill white -opaque white \
        -colors 2 \
        _black.tif

convert _base.tif \
        -fuzz 30% -fill white +opaque red \
        -median 3 \
        -blur 10 -fuzz 50% -fill white -opaque white \
        -colors 2 \
        _color.tif

composite -compose multiply _color.tif _black.tif _composite.ppm

# Create the iw44 foreground.
convert _composite.ppm -threshold 95% -negate _foreground_mask.pbm
c44 -decibel 40 -mask _foreground_mask.pbm _composite.ppm _foreground.djvu
djvuextract _foreground.djvu BG44=_foreground.iw4

# Create the text layer that will be colored.
convert _base.tif -threshold 99% _text.tif
cjb2 -dpi 600 -lossless _text.tif _text.djvu

# Put is all together.
djvumake out.djvu INFO=,,600 Sjbz=_text.djvu FG44=_foreground.iw4
ddjvu -format=tiff out.djvu out.tiff


out.jpg


I'm still fiddling with the entire batch of images you sent me (the whole Christmas thing kinda eats into my free time). The challenge is to work out a process that is generic enough to work on all the images and that is ignorant of what color or colors the non-black text is. Fun stuff.

Edit: ...and I guess .tiff files are not previewed in the thread, so I changed it to a .jpg
User avatar
strider1551
 
Posts: 126
Joined: 01 Mar 2010, 11:39
Location: Ohio, USA

Re: Colored Text

Postby strider1551 » 24 Dec 2010, 08:43

I would be doing all of this to get better compression in a djvu file but putting the colors in one layer and the text in another. The benefit for you, of course, is that with a pure white background you can leave the image layer of your pdf visible and still see the musical notation. I expect the file size will also drop.
User avatar
strider1551
 
Posts: 126
Joined: 01 Mar 2010, 11:39
Location: Ohio, USA

Re: Colored Text

Postby clemd973 » 25 Dec 2010, 03:12

strider1551 wrote:The benefit for you, of course, is that with a pure white background you can leave the image layer of your pdf visible and still see the musical notation. I expect the file size will also drop.

Well, actually, the plain white background is acquired only by hiding the image layer (which is the layer the background is on). Then, though, when the image layer is hidden, so goes the musical notes - since they are rendered on the image layer as well. The work around so far is to process separately in Scan Tailor the pages with the musical notation, blocking off the red text with the picture tool. This gives the white background through Scan Tailor, while maintaining the colored text. Then is required re-insertion of those pages (and there aren't that many) into the original document...not a problem, really, just a little more time consuming.

Quite impressed with your rendition so far. I believe Clear Scan gives it that extra sharpness. Would be good if you could code something like that to give your image that extra boost. Also, pretty soon I'll be scanning the funeral rite, also containing colored text; if your interested in those images as well. Let me know.
User avatar
clemd973
 
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Colored Text

Postby clemd973 » 25 Dec 2010, 03:16

kasslloyd wrote:Only downside to using clear scan is that 600 dpi source is recommended, anything less you start to loose quality and smaller text can be garbled beyond recognition with clear scan if the source dpi isn't high enough.

I believe Scan Tailor allows 600 dpi output. On my sample pages, I selected 300. I'll have to keep the 600 dpi factor in mind in case I run into any problems later.

kasslloyd wrote:I myself use acrobat, being a college student I can get it for $100 instead of $500. Haven't found any cheaper and definitely not free alternatives to do all that acrobat can do.

Yea, I got Acrobat X Pro at a teacher discount through our school. Without that, I'd probably be waiting for Strider to finish his work and go that route.
User avatar
clemd973
 
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Colored Text

Postby strider1551 » 25 Dec 2010, 23:29

clemd973 wrote:Well, actually, the plain white background is acquired only by hiding the image layer (which is the layer the background is on). Then, though, when the image layer is hidden, so goes the musical notes - since they are rendered on the image layer as well.


Right, but part of what my process does is create a pure white background, so the only thing in the image layer would be the musical notation. Imagine if you fed the processed image I posted above into the pdf software you are using - no need to hide the image layer, because there is no pesky background. You get good colors, musical notation, white background, and without having to manually block off the red text in Scantailor for every page of 400+ page book.
User avatar
strider1551
 
Posts: 126
Joined: 01 Mar 2010, 11:39
Location: Ohio, USA

Re: Colored Text

Postby clemd973 » 26 Dec 2010, 00:24

strider1551 wrote:You get good colors, musical notation, white background, and without having to manually block off the red text in Scantailor for every page of 400+ page book.

Actually, and thankfully, there are only four pages of musical notation in this 400+ page book. Might not always be that way, though. Can you sharpen the text in your sample?
User avatar
clemd973
 
Posts: 121
Joined: 22 Aug 2010, 21:20

PreviousNext

Return to Software

Who is online

Users browsing this forum: No registered users and 1 guest