NConvert and PDF size reduction

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Post Reply
JasonStonier
Posts: 19
Joined: 16 Nov 2021, 06:29
E-book readers owned: Kindle Paperwhite
Number of books owned: 2000
Country: UK

NConvert and PDF size reduction

Post by JasonStonier »

I've search and searched on this topic, but I think I don't really know the question I'm trying to ask...

I'm scanning 120-page magazines to PDF - the workflow is:

1) Scan the double page to ~60x A3 jpeg's at 10MP
2) Crop the A3 to two A4 in nconvert ~120x pages
3) Assemble the ~120 A4 pages to a PDF in nconvert

My nconvert settings for the PDF are:

Code: Select all

nconvert - multi -c 4 -q 50 -dpi 72 -out pdf -o today.pdf *.jpg
With no compression the file size is around 1.5 Gigabytes :o
With ZIP or FAX compression (C1 or C4) the size is around 700 Megabytes

However, comparing to PDFs of similar magazines I've 'found' on the internet, they normally come out less than 1 Megabyte and have similar apparent quality.

So - any tips on what I am doing wrong? I'm archiving about 1000 magazine so a Gig each is some heafty storage space needed...
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: NConvert and PDF size reduction

Post by cday »

Could you upload a representative page?

For content that can be reproduced in black and white, Fax (CCITT G4) compression is normally a good starting point, although there are other compression methods which aren't normally supported in basic tools such as the freeware NConvert. Examples are JBIG 2 and Adobe Acrobat's Clearscan, which can be very effective for good quality images, although that name is actually no longer used by Adobe.

Edit:

While pages that can be rendered in pure black and white, that is pages that consist solely of text and line drawings, can be compressed very efficiently when converted to black and white 1-bit colour depth, a magazine even when printed in 'black and white' rather than colour, is likely to include halftone images or photographs which can't normally be rendered in black and white without being seriously degraded.

Pages that must be rendered in grayscale or colour inevitable have much larger file sizes, which can only really be minimised by segmenting the pages into areas such as text that can be rendered as black and white, and typically much smaller areas that must be rendered in grayscale or colour. That normally requires commercial software, Abbyy FineReader, Omnipage or Adobe Acrobat, for ultimate size reduction and is sometimes refrerred to as MRC standing for 'Mixed Raster Content'. Another possibility might be DjVu format, which although little used in the west, is potentially a freeware alternative to Adobe Acrobat in some but not all relevant respects.
JasonStonier
Posts: 19
Joined: 16 Nov 2021, 06:29
E-book readers owned: Kindle Paperwhite
Number of books owned: 2000
Country: UK

Re: NConvert and PDF size reduction

Post by JasonStonier »

Images attached - in this case, the magazine has 88 pages, and they average 1MB each, so I'm really confused how the PDF can end up at 1.5 gig :?
Attachments
IMG_2021_11_29_21_54_44-2.jpg
IMG_2021_11_29_21_54_44-1.jpg
IMG_2021_11_29_21_53_56-2.jpg
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: NConvert and PDF size reduction

Post by cday »

I've downloaded your three JPEGs and combined them into a multipage PDF, pages in the wrong order, but the filesize is around 2.5MB which seems reasonable for a starting point.

The original indicated JPEG 'quality' was 85, saved quality 80 as that was the value I had set. Possibly some scope for reducing filesize with minimal visible loss of quality by testing lower settings, also some possibility of optimising the images themselves. JPEG 'quality' values are not directly transferable between different softwares.

I used a GUI software for convenience, I'll need to have a look at the NConvert code to produce a similar result.

New_Scientest_001.pdf
(2.41 MiB) Downloaded 157 times
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: NConvert and PDF size reduction

Post by cday »

This is the basic NConvert code to combine JPEGs into a multipage PDF file while maintaining the original image Q value:

Code: Select all

NConvert -out pdf -multi -use_org_quality -c 5 -o Test.pdf *.jpg
This is the basic code to set a Q value of 85, the value shown for your source::

Code: Select all

NConvert -out pdf -multi -q 85 -c 5 -o Test.pdf *.jpg
The inclusion of the following subsampling term may slightly improve the output image quality but I haven't tested that, if you try it I would be interested to know::

Code: Select all

-subsampling 1x1,1x1,1x1

Testing quickly, there is little or no evident loss of image quality for the JPEGs you uploaded when the Q value is reduced to 60 or even 50, with a resulting significant reduction in the output file size:

Q=85 - 2.62 MB
Q=70 - 2.01 MB
Q=60 - 1.44 MB
Q=50 - 1.36 MB

If you have a PDF viewer that can display multiple files in tabs, such as Adobe Acrobat (the basic free version, whatever it is called now) it is very easy to compare the quality of alternative output versions, zoom in equally on each for a closer look.

Beyond that, it looks as if your images might benefit with some preprocessing, a separate topic and probably one for someone else, I notice a slight blue-ish tint, and they might respond to slight sharpening, for example.

Given reasonable quality images of typical text rather than primarily photographic pages, I could test the effect of using Adobe Clearscan, which in favourable circumstances can be very effective in reducing file size, but which unfortunately is becoming less accessible as I believe it is now only included in the premium Acrobat Pro version.

Test_Q85.pdf
(2.62 MiB) Downloaded 148 times
Test_Q70.pdf
(2.01 MiB) Downloaded 146 times
Test_Q60.pdf
(1.44 MiB) Downloaded 149 times
Test_Q50.pdf
(1.36 MiB) Downloaded 148 times
JasonStonier
Posts: 19
Joined: 16 Nov 2021, 06:29
E-book readers owned: Kindle Paperwhite
Number of books owned: 2000
Country: UK

Re: NConvert and PDF size reduction

Post by JasonStonier »

Thank you so much for the time you put into that - I really appreciate it.

Just tried a load of settings based on yours, and the key one was the -C 5 (jpeg compression).

Using my original nconvert parameters gave me a size of 51MB for those three pages; using yours gave me 2.6MB.

So the final string for me is

Code: Select all

nconvert -multi -c 5 -out pdf -use_org_quality -o today.pdf Dest\*.jpg
And the final file size for a typical magazine is under 50MB.

Thanks!
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: NConvert and PDF size reduction

Post by cday »

JasonStonier wrote: 16 Dec 2021, 05:20 Just tried a load of settings based on yours, and the key one was the -C 5 (jpeg compression).
When colour and grayscale images are saved in a PDF file, the JPEG compression option although in principle lossy is normally the most practical option when file size is a consideration.

So the final string for me is:

Code: Select all

nconvert -multi -c 5 -out pdf -use_org_quality -o today.pdf Dest\*.jpg
My understanding of the -use_org_quality option is that it tries to output the output JPEG at the original Q value of the image. Specifying a Q value would enable you to balance filesize and quality, although it might be as well to leave a margin for safety.

Using my original nconvert parameters gave me a size of 51MB for those three pages; using yours gave me 2.6MB.

And the final file size for a typical magazine is under 50MB.
That is satisfactory? For an archived weekly magazine that will still be around 2.5 GB a year. Based on my quick tests, I think you could use a lower Q value without losing any significant perceptible quality and further reduce the size, although memory costs can be expected to continue to fall.

Before going too far, you could also test the -subsampling 1x1,1x1,1x1 option to see if there is perceptible increase in quality, using that option with a reduced Q value might optimise the balance between perceived image quality and file size.

[Edited].
Post Reply