Smaller: methods to reduce the size of a PDF file

Hot on the heals of the post on how to downsize microscopy movie files, let’s look at ways to shrink the size of a PDF file. There’s several ways to tackle this – suggestions came from this thread on Mastodon.

Scenario: you have created a preprint/manuscript/proposal in PDF format. It looks great and is 18.4 MB. The journal/funder/whoever requires the uploaded file to be less than 10 MB. What do you do?

Scroll down for a comparison of all the solutions. And scroll further for an explainer of how/why these solutions work.

Solution 1: use ColorSync Utility

This has been my go-to method for years. ColorSync Utility can be found on a Mac in Applications/Utilities. There is a filter called “Reduce File Size” which may be OK but if not, you can make a custom filter. The following settings work well.

  • Image Sampling
    • Quality: High
    • Disable Set Scale
    • Check Set Resolution to 144 Pixels/Inch
    • Check Constrain Size to a Max of 2400 Pixels
  • Image Compression
    • JPEG
    • Set quality to Max

To use this, duplicate the original PDF. Drag-and-drop onto ColorSync utility, select the filter from the drop down menu below the PDF. Click Apply and Command + S to save.

Or, you can open the PDF in Preview, select File > Export… select PDF and you should see your custom filter in the Quartz filter list.

Result: 18.4 MB converts to 7.7 MB

Solution 2: use ghostscript

Boris Barbour suggested using ghostscript on the command line. There are several presets which give different quality outputs.

 gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=printer.pdf input.pdf gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=screen.pdf input.pdf gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=ebook.pdf input.pdf gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress -dNOPAUSE -dQUIET -dBATCH -sOutputFile=prepress.pdf input.pdf 

Additionally, Gaspar Jekely suggested that -dColorImageResolution=300 can be added to give more control over the images.

Result: 18.4 MB converts to 1 MB (worst) through to 9 MB

Solution 3: other command line options

Nicola Romano suggested several command line tools for Linux. I had #2 available on my system (Mac) and tested this option.

 # use pdftk pdftk inputfile.pdf -o outputfile.pdf compress # convert to postscript then back to pdf pdf2ps inputfile.pdf output.ps ps2pdf output.ps inputfile_small.pdf # use qpdf qpdf --linearize inputfile.pdf output.pdf 

Result: 18.4 MB converts to 34.8 MB! I am sure the other methods would work better, but I didn’t test.

Not a solution

Finally, I would not recommend “printing to PDF” as this may destroy links in your PDF file. If this is not an issue and you are in a hurry, it is certainly an option. I really would not recommend uploading your PDF to a random website that will just run one of the above routines and return the output to you. The best you can hope for is that that is all they do. If there is any sensitive information in your file, obviously don’t upload it to a random web page!

Results

Let’s look at the file sizes first. To test, I used a manuscript PDF file that was 18.4 MB it was made in Overleaf using my bioRxiv template. The PDF is a composite of text and raster images at 300 dpi. As Nicola noted with his suggestions, the source can affect the results.

SolutionMethodSize (MB)
1ColorSync Utility7.7
2gs prepress9.6
2gs printer6.7
2gs ebook2.3
2gs screen1
3pdf2ps, ps2pdf34.8

What is the quality like?

At low magnification on a screen the results look similar. With the exception of “gs screen”, the images look OK.

Differences become apparent when zooming in.

Line art

Microscopy images

I think the gs outputs for using the printer or prepress options were superior to the ColorSync solution. The pdf2ps-ps2pdf solution produced a good pdf but the size was too big. The gs options for “ebook” and for “screen” were worse than the ColorSync solution.

Obviously these solutions can be tweaked further, and a lot depends on the goal. In this context, to get under 10 MB with the best quality possible, the gs solution with “prepress” setting performed best. If the starting PDF was larger, this method probably wouldn’t make it under the 10 MB limit. In that case dropping to “printer” would be better and so on.

Explainer

How can we shrink a PDF and not lose quality? Well, the secret is that the resolution of the images in your initial file is probably too high for the purpose required. PDF is a print format, but chances are, the file just needs to be displayed on a screen. So the images in the file can be downsampled and this will result in a smaller file. If the downsampling is too aggressive you will have a small but low quality file, but get it right and you can shrink the file by a lot and not notice any difference. In my example the images were at 300 dpi, which is overspecification. For example, the standard for screen display is 72 dpi, and for web it is 96 dpi. The ColorSync Utility method resamples to 144 dpi which is twice screen resolution.

Note that, if the PDF contains vector objects, and lots of them (e.g. a plot with thousands of points), then a big saving can come from rasterising the plot.

The post title comes from “Smaller” by Tim Bowness & Samuel Smiles from their World of Bright Futures album. There was a great 90s indie rock band from Liverpool called Smaller, they may appear in the future.