Skip to content

Instantly share code, notes, and snippets.

@angelsen
Created March 21, 2024 14:37
Show Gist options
  • Select an option

  • Save angelsen/33eba70a15e467218858d2789cf0dfaf to your computer and use it in GitHub Desktop.

Select an option

Save angelsen/33eba70a15e467218858d2789cf0dfaf to your computer and use it in GitHub Desktop.
Screenshot OCR Tool for GNOME (Wayland)
#!/bin/bash
# Screenshot OCR Tool for GNOME (Wayland)
# Dependencies: gnome-screenshot, tesseract-ocr, wl-clipboard
# Usage: Bind this script to a keyboard shortcut to capture part of the screen,
# perform OCR on the selection, and copy the text to the clipboard.
# Define temporary file paths
TMP_IMG="/tmp/screenshot.png"
TMP_TXT_BASE="/tmp/screenshot" # Tesseract will add .txt to this base name
# Take a screenshot. If cancelled, copy a message to the clipboard and exit.
gnome-screenshot -a -f "$TMP_IMG" || {
echo "Screenshot capture cancelled." | wl-copy
exit 1
}
# If the screenshot file does not exist, copy a message to the clipboard and exit.
if [ ! -f "$TMP_IMG" ]; then
echo "No screenshot file found." | wl-copy
exit 1
fi
# Use Tesseract OCR to convert the screenshot to text.
# If Tesseract fails, copy an error message to the clipboard and exit.
tesseract "$TMP_IMG" "$TMP_TXT_BASE" -l eng || {
echo "OCR processing failed." | wl-copy
rm "$TMP_IMG" # Clean up the screenshot file
exit 1
}
# Construct the text file's actual path (Tesseract adds .txt)
TMP_TXT="$TMP_TXT_BASE.txt"
# Check if the OCR text file exists. If it does, copy its content to the clipboard.
if [ -f "$TMP_TXT" ]; then
wl-copy < "$TMP_TXT"
rm "$TMP_TXT" # Clean up the OCR text file
else
echo "OCR produced no output text." | wl-copy
fi
# Clean up the screenshot file.
rm "$TMP_IMG"
@angelsen
Copy link
Author

This Bash script allows users to take screenshots on GNOME (running under Wayland) and automatically performs OCR (Optical Character Recognition) on the captured images. The text extracted from the screenshots is then copied to the clipboard for easy pasting. This tool is perfect for quickly copying text from images, PDFs, or other non-selectable text sources. Designed to be triggered via keyboard shortcut for efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment