Skip to content

Instantly share code, notes, and snippets.

@sahal
Last active April 11, 2025 01:52
Show Gist options
  • Select an option

  • Save sahal/6d002095e3bdacbbccb09767f8104db2 to your computer and use it in GitHub Desktop.

Select an option

Save sahal/6d002095e3bdacbbccb09767f8104db2 to your computer and use it in GitHub Desktop.
Clean up PDF metadata two ways

Clean up PDF metadata two ways

Using exiftool and qpdf

(not recommended)

$ cp output.pdf output_test-1.pdf
# When using clean_pdf_metadata_v1()
$ pdf-clean.sh output-1.pdf

This still works, but only technically (because I'm awesome). We have to run qpdf immediately after running exiftool. If the qpdf fails or exiftool runs second -- the PDF will still retain metadata.

Using mat2

$ cp output.pdf output_test-2.pdf
$ pdf-clean.sh output-2.pdf
ExifTool Version Number         : 13.25
File Name                       : output-2.pdf
Directory                       : .
File Size                       : 48 kB
File Modification Date/Time     : 2025:04:10 20:42:15-05:00
File Access Date/Time           : 2025:04:10 20:42:15-05:00
File Inode Change Date/Time     : 2025:04:10 20:42:15-05:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Create Date                     : 2025:04:11 00:58:05+00:00
Creator                         : Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Joplin/3.2.13 Chrome/128.0.6613.178 Electron/32.2.0 Safari/537.36
Modify Date                     : 2025:04:11 00:58:05+00:00
Producer                        : Skia/PDF m128
Title                           : Latino Film Festival 3/13 Schedule
Page Count                      : 5


Would you like to continue? [y/n]: y
NICE: PDF file metadata is NOT restorable via exiftool!
ExifTool Version Number         : 13.25
File Name                       : output-2.pdf
Directory                       : .
File Size                       : 1785 kB
File Modification Date/Time     : 2025:04:10 20:42:24-05:00
File Access Date/Time           : 2025:04:10 20:42:24-05:00
File Inode Change Date/Time     : 2025:04:10 20:42:24-05:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 5

Yay, we have no more metadata in the file and I don't have to remember command line flags ever again.

Install

Just put it in your path...

#!/usr/bin/env bash
# https://web.archive.org/web/20201128130952/https://blog.joshlemon.com.au/protecting-your-pdf-files-and-metadata/
set -e
set -u
pdf_file_name="${1:-unset}"
# https://stackoverflow.com/a/29436423
yes_or_no() {
while true; do
read -r -p "$* [y/n]: " yn
case $yn in
[Yy]*) return 0 ;;
[Nn]*) echo "Aborted" ; return 1 ;;
esac
done
}
dump_pdf_metadata() {
local pdf_file_name="${1:-unset}"
exiftool -all "${pdf_file_name}"
}
clean_pdf_metadata_v1() {
# I'm going to avoid using this because its too easy to f-up
# when using exiftool
# see: https://dustri.org/b/cleaning-pdf-metadata-in-depth.html
local pdf_file_name="${1:-unset}"
exiftool -overwrite_original -all:all= "${pdf_file_name}"
qpdf --verbose --linearize --replace-input "${pdf_file_name}"
}
test_exiftool_restore_metadata() {
local pdf_file_name="${1:-unset}"
# All to avoid an if statement and SC2015
# Famous last words: I like it like this though
{ exiftool -pdf-update:all= "${pdf_file_name}" > /dev/null 2>&1 && \
echo "OOPS: PDF file metadata is restorable via exiftool!"; } || \
echo "NICE: PDF file metadata is NOT restorable via exiftool!"
}
clean_pdf_metadata() {
local pdf_file_name="${1:-unset}"
mat2 --verbose --inplace "${pdf_file_name}"
}
dump_pdf_metadata "${pdf_file_name}"
echo
echo
yes_or_no "Would you like to continue?" || exit
clean_pdf_metadata "${pdf_file_name}"
test_exiftool_restore_metadata "${pdf_file_name}"
dump_pdf_metadata "${pdf_file_name}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment