Skip to content

Instantly share code, notes, and snippets.

@rien333
Created December 23, 2025 13:06
Show Gist options
  • Select an option

  • Save rien333/93d1b23a8e616dee662b84930d06fca5 to your computer and use it in GitHub Desktop.

Select an option

Save rien333/93d1b23a8e616dee662b84930d06fca5 to your computer and use it in GitHub Desktop.
`$WACZ` is je input file. De kern van de procedure is dat een WACZ een apart WARC bestand met screenshots embed. In dit WARC bestand zijn de screenshots gewoon opgeslagen als normale WARC-records, die je met het programma `warcio` kunt extracten. 
#!/bin/fish
set tmpdir (mktemp -d)
# 'archive/*' is given as an entry point to extract as little as possible
unzip -qq $WACZ 'archive/*' -d $tmpdir
# find the screenshot WARC within the WACZ
set screenshot_warc (fd -tfile screenshots $tmpdir/archive/ | head -1)
# calculate the location of the screenshot in the WARC
set content_length (warcio extract --header "$screenshot_warc" 0 | rg 'Content-Length: (\d+)' -r '$1')
set header_length (warcio extract --header "$screenshot_warc" 0 | wc -c)
set screenshot_offset (math $content_length + $header_length + 4)
# WARCs are often gzip'ed
gunzip $screenshot_warc
warcio extract --payload (echo "$screenshot_warc" | sd -F '.warc.gz' '.warc') $screenshot_offset > $WACZ.png
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment