Manipulating PDF Files
Linux supports many different tools to manipulate PDF files. The tips below cover some common PDF manipulation tasks.
Contents
- Remove or Manipulate Metadata of a PDF File
- Export Images from a PDF File
- Extract Certain Pages from a PDF File
- Crop Pages of a PDF File
- Compress a PDF File
Remove or Manipulate Metadata of a PDF File
Metadata can be removed or manipulated using PDFtk (manual).
Installation:
sudo apt install pdftk-javaRemove Metadata
pdftk input.pdf dump_data | \
sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
pdftk input.pdf update_info - output output.pdfManipulate Metadata
First, extract metadata of the PDF file:
pdftk input.pdf dump_data output metadata.txtManipulate metadata.txt in a text editor, then update the PDF file’s metadata:
pdftk input.pdf update_info metadata.txt output output.pdfExport Images from a PDF File
The tool pdfimages can be used to extract all images in a PDF file in their original format to a certain directory:
pdfimages -all input.pdf ./Inspecting the resolution of image files contained in a PDF file:
pdfimages -list input.pdfExtract Certain Pages from a PDF File
The tool qpdf can be used to extract pages or ranges of pages from a PDF file.
Installation:
sudo apt install qpdfSample call to extract a range of pages:
qpdf --empty --pages input.pdf 36-38 -- output.pdfCrop Pages of a PDF File
The tool krop supports cropping PDF files either via a GUI or from the command line. It can be used to crop page margins, split multiple pages contained in one PDF page into individual pages, or to extract parts (e. g., articles) of a page:
Installation:
sudo apt install kropCompress a PDF File
Ghostscript can be used to compress PDF files and change the quality of the images contained therein.
Sample call:
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf \
input.pdfThe argument -dPDFSETTINGS supports these values (documentation):
/screen: 72 dpi./ebook: 150 dpi./prepress: 300 dpi./printer: 300 dpi./default: 72 dpi.
Ghostscript also allows for setting the PDF compatibility level, font embedding and subsetting, as well as controlling image resolution and downsampling type.