Manipulating PDF Files
Linux supports many different tools to manipulate PDF files. The tips below cover some common PDF manipulation tasks.
Contents
- Remove or Manipulate Metadata of a PDF File
- Export Images from a PDF File
- Extract Certain Pages from a PDF File
- Crop Pages of a PDF File
- Compress a PDF File
Remove or Manipulate Metadata of a PDF File
Metadata can be removed or manipulated using PDFtk (manual).
Installation:
sudo apt install pdftk-java
Remove Metadata
pdftk input.pdf dump_data | \
sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
pdftk input.pdf update_info - output output.pdf
Manipulate Metadata
First, extract metadata of the PDF file:
pdftk input.pdf dump_data output metadata.txt
Manipulate metadata.txt
in a text editor, then update the PDF file’s metadata:
pdftk input.pdf update_info metadata.txt output output.pdf
Export Images from a PDF File
The tool pdfimages can be used to extract all images in a PDF file in their original format to a certain directory:
pdfimages -all input.pdf ./
Inspecting the resolution of image files contained in a PDF file:
pdfimages -list input.pdf
Extract Certain Pages from a PDF File
The tool qpdf can be used to extract pages or ranges of pages from a PDF file.
Installation:
sudo apt install qpdf
Sample call to extract a range of pages:
qpdf --empty --pages input.pdf 36-38 -- output.pdf
Crop Pages of a PDF File
The tool krop supports cropping PDF files either via a GUI or from the command line. It can be used to crop page margins, split multiple pages contained in one PDF page into individual pages, or to extract parts (e.g., articles) of a page:
Installation:
sudo apt install krop
Compress a PDF File
Ghostscript can be used to compress PDF files and change the quality of the images contained therein.
Sample call:
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf \
input.pdf
The argument -dPDFSETTINGS
supports these values (documentation):
/screen
: 72 dpi./ebook
: 150 dpi./prepress
: 300 dpi./printer
: 300 dpi./default
: 72 dpi.
Ghostscript also allows for setting the PDF compatibility level, font embedding and subsetting, as well as controlling image resolution and downsampling type.