Skip to content

Instantly share code, notes, and snippets.

@manu-mannattil
Last active July 5, 2019 17:21
Show Gist options
  • Select an option

  • Save manu-mannattil/92163ec8256ef896509bbe86a0dd171f to your computer and use it in GitHub Desktop.

Select an option

Save manu-mannattil/92163ec8256ef896509bbe86a0dd171f to your computer and use it in GitHub Desktop.
Make clean PDFs look like they were scanned
#!/bin/sh
#
# clean2scan.sh -- make clean PDFs look like they were scanned
#
# Usage: clean2scan.sh <file>
#
# This is a simple script that uses ImageMagick and Ghostscript to
# distort and transform pages in a PDF to mimick scanned PDFs.
#
# Requires: convert(1) and gs(1)
#
[ "$*" ] || {
echo >&2 "usage: ${0##*/} <file>"
exit 1
}
tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir" >/dev/null 2>&1' EXIT
trap 'exit 2' HUP INT QUIT TERM
# Target resolution (DPI).
resolution=192
# Generate random real numbers in the range [-arg, arg].
random() {
[ "$1" ] || set -- 1.0
awk "BEGIN { srand(); print -$1 + 2.0 * $1 * rand(); exit; }"
}
# Split PDF into individual pages.
convert -density "$resolution" "$1" "${tmpdir}/%010d.pdf"
# Add a little bit of multiplicative noise, random rotation, barrel
# distortion, and convert each page to grayscale.
for pdf in "$tmpdir"/*.pdf
do
convert -density "$resolution" \
"$pdf" \
-rotate "$(random 1.5)" \
-attenuate 0.3 \
+noise Multiplicative \
-distort Barrel "$(random 0.005) 0.0 0.0" \
-colorspace Gray "${pdf%.*}.scan.pdf"
done
# Merge the distorted pages to a single PDF and optimize using
# Ghostscript.
convert -density "$resolution" "$tmpdir"/*.scan.pdf "$tmpdir/merged.pdf"
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/printer \
-dNOPAUSE \
-dQUIET \
-dBATCH \
-sOutputFile="${1%.*}.scan.pdf" \
"$tmpdir/merged.pdf"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment