Manipulating PDF Files

Linux supports many different tools to manipulate PDF files. The tips below cover some common PDF manipulation tasks.

Contents

Remove or Manipulate Metadata of a PDF File

Metadata can be removed or manipulated using PDFtk (manual).

Installation:

sudo apt install pdftk-java

Remove Metadata

pdftk input.pdf dump_data | \
  sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
  pdftk input.pdf update_info - output output.pdf

Manipulate Metadata

First, extract metadata of the PDF file:

pdftk input.pdf dump_data output metadata.txt

Manipulate metadata.txt in a text editor, then update the PDF file’s metadata:

pdftk input.pdf update_info metadata.txt output output.pdf

Export Images from a PDF File

The tool pdfimages can be used to extract all images in a PDF file in their original format to a certain directory:

pdfimages -all input.pdf ./

Inspecting the resolution of image files contained in a PDF file:

pdfimages -list input.pdf

Extract Certain Pages from a PDF File

The tool qpdf can be used to extract pages or ranges of pages from a PDF file.

Installation:

sudo apt install qpdf

Sample call to extract a range of pages:

qpdf --empty --pages input.pdf 36-38 -- output.pdf

Crop Pages of a PDF File

The tool krop supports cropping PDF files either via a GUI or from the command line. It can be used to crop page margins, split multiple pages contained in one PDF page into individual pages, or to extract parts (e.g., articles) of a page:

Installation:

sudo apt install krop

Compress a PDF File

Ghostscript can be used to compress PDF files and change the quality of the images contained therein.

Sample call:

gs -sDEVICE=pdfwrite \
  -dCompatibilityLevel=1.4 \
  -dPDFSETTINGS=/ebook \
  -dNOPAUSE -dQUIET -dBATCH \
  -sOutputFile=output.pdf \
  input.pdf

The argument -dPDFSETTINGS supports these values (documentation):

Ghostscript also allows for setting the PDF compatibility level, font embedding and subsetting, as well as controlling image resolution and downsampling type.