Skip to content

Instantly share code, notes, and snippets.

@renoirb
Last active February 5, 2026 20:46
Show Gist options
  • Select an option

  • Save renoirb/9610a93371a294afebc20cb8698825dc to your computer and use it in GitHub Desktop.

Select an option

Save renoirb/9610a93371a294afebc20cb8698825dc to your computer and use it in GitHub Desktop.
Voice note to text transcription using faster-whisper
# Faster-Whisper is currently well supported only on 3.11
PYTHON=python3.11
venv/
__pycache__/
*.pyc
# Machine specific
.env
# Audio files and transcription output
*.m4a
*.wav
*.txt
*.md
!README.md
!CLAUDE.md
# Archives
*.zip

Voice Note to Text

Local audio transcription using faster-whisper (3-4x faster than OpenAI's whisper, same accuracy). Converts M4A/WAV/MP3 files to text transcripts on CPU.

Setup (Local Python)

Requires Python 3.11+ and ffmpeg.

# macOS prerequisites
brew install python@3.11 ffmpeg

# Clone this gist, then:
./setup.sh

# If your default python3 is too new (3.14+), point to 3.11:
PYTHON=python3.11 ./setup.sh

Usage

source venv/bin/activate

# Simple (base model, English) — produces recording.txt
python transcribe_faster.py recording.m4a

# With flags — produces recording.txt (or override with --output)
python transcribe.py recording.m4a --model medium --language en --output notes.txt

deactivate

Output is always <filename>.txt next to the input file. Use --output in transcribe.py to override.

Models

Model Speed Quality Use case
tiny Fastest Lower Quick preview
base Fast Good Default, recommended
small Medium Better Important recordings
medium Slow High When accuracy matters
large-v3 Slowest Best Critical transcriptions

Files

File Purpose
setup.sh One-command venv creation and install
requirements.txt Pinned dependencies
transcribe.py Full-featured script with --model, --language, --output flags
transcribe_faster.py Quick script: python transcribe_faster.py file.m4a [model] [lang]

Troubleshooting

"ffmpeg not found"brew install ffmpeg (macOS) or sudo apt-get install ffmpeg (Linux)

"No module named 'faster_whisper'" — Activate the venv first: source venv/bin/activate

Poor quality — Use a larger model: --model medium or --model large-v3

Python too new — faster-whisper needs 3.11 or 3.12. Use PYTHON=python3.11 ./setup.sh

Voice Note to Text

Transcribe voice recordings (m4a/wav/mp3) to text locally using faster-whisper on CPU. No cloud APIs needed.

Use Case

Record voice notes on phone, transfer to computer, run transcription to get text for editing in Obsidian or elsewhere.

Stack

  • Python 3.11 (pinned via .env, required for ML package compatibility)
  • faster-whisper — CTranslate2-based whisper, 3-4x faster than OpenAI's original
  • ffmpeg — system dependency for audio decoding

Setup

cp .env.example .env   # set PYTHON version if needed
./setup.sh             # creates venv, installs deps

Usage

source venv/bin/activate
python transcribe_faster.py recording.m4a              # quick, base model
python transcribe.py recording.m4a --model medium       # better quality

Output is always <input>.txt next to the input file (e.g. recording.m4a produces recording.txt). Override with --output path.txt in transcribe.py.

Key Files

File Role
setup.sh Creates venv, installs requirements
.env / .env.example Machine-specific Python version (gitignored / template)
requirements.txt faster-whisper==1.2.1
transcribe.py Argparse script with --model, --language, --output
transcribe_faster.py Minimal script with positional args

What's Ignored

venv/, .env, *.m4a, *.wav, *.txt, *.zip, __pycache__/ — see .gitignore.

faster-whisper==1.2.1
#!/bin/bash
set -e
PYTHON="${PYTHON:-python3}"
VENV_DIR="venv"
# Load machine-specific config (copy .env.example to .env)
if [ -f .env ]; then
source .env
else
echo "No .env file to read preferences, copy from .env.example"
exit 1
fi
# Check prerequisites
if ! command -v "$PYTHON" &>/dev/null; then
echo "Error: $PYTHON not found. Install Python 3.11+ first."
echo " macOS: brew install python@3.11"
echo " Linux: sudo apt-get install python3 python3-venv"
exit 1
fi
if ! command -v ffmpeg &>/dev/null; then
echo "Error: ffmpeg not found. Required for audio processing."
echo " macOS: brew install ffmpeg"
echo " Linux: sudo apt-get install ffmpeg"
exit 1
fi
# Show Python version
echo "Using: $($PYTHON --version)"
# Create venv
if [ -d "$VENV_DIR" ]; then
echo "venv already exists. Delete it first to recreate:"
echo " rm -rf $VENV_DIR && ./setup.sh"
exit 1
fi
echo "Creating virtual environment..."
"$PYTHON" -m venv "$VENV_DIR"
echo "Installing dependencies..."
"$VENV_DIR/bin/pip" install --upgrade pip --quiet
"$VENV_DIR/bin/pip" install -r requirements.txt
# Make scripts executable
chmod +x transcribe.py transcribe_faster.py
echo ""
echo "Setup complete. Usage:"
echo ""
echo " source venv/bin/activate"
echo " python transcribe_faster.py your-audio.m4a"
echo ""
echo "Or specify a model (tiny/base/small/medium/large-v3):"
echo ""
echo " python transcribe.py your-audio.m4a --model medium"
#!/usr/bin/env python3
import argparse
from pathlib import Path
from faster_whisper import WhisperModel
def main():
parser = argparse.ArgumentParser(description='Transcribe audio using Faster-Whisper')
parser.add_argument(
'input',
type=str,
help='Input audio file',
)
parser.add_argument(
'--model',
type=str,
default='base',
choices=['tiny', 'base', 'small', 'medium', 'large-v2', 'large-v3'],
help='Model size (default: base)',
)
parser.add_argument(
'--language',
type=str,
default='en',
help='Language code (default: en)',
)
parser.add_argument(
'--output',
type=str,
help='Output file (default: input.txt)',
)
args = parser.parse_args()
# Initialize model
print(f"Loading {args.model} model...")
model = WhisperModel(
args.model,
device="cpu",
compute_type="int8",
)
# Transcribe
print(f"Transcribing {args.input}...")
segments, info = model.transcribe(
args.input,
language=args.language,
)
# Determine output file
if args.output:
output_path = Path(args.output)
else:
output_path = Path(args.input).with_suffix('.txt')
# Write transcription
with open(output_path, 'w', encoding='utf-8') as f:
for segment in segments:
f.write(segment.text.strip() + '\n')
print(f"\nTranscription saved to: {output_path}")
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
if __name__ == '__main__':
main()
#!/usr/bin/env python3
import sys
from pathlib import Path
from faster_whisper import WhisperModel
if len(sys.argv) < 2:
print("Usage: python transcribe_faster.py <audio_file> [model] [language]")
print("\nExamples:")
print(" python transcribe_faster.py recording.m4a")
print(" python transcribe_faster.py recording.m4a medium en")
sys.exit(1)
input_file = sys.argv[1]
model_size = sys.argv[2] if len(sys.argv) > 2 else "base"
language = sys.argv[3] if len(sys.argv) > 3 else "en"
# Initialize model
print(f"Loading {model_size} model...")
model = WhisperModel(
model_size,
device="cpu",
compute_type="int8",
)
# Transcribe
print(f"Transcribing {input_file}...")
segments, info = model.transcribe(
input_file,
language=language,
)
# Output file
output_file = Path(input_file).with_suffix('.txt')
# Write transcription
with open(output_file, 'w', encoding='utf-8') as f:
for segment in segments:
f.write(segment.text.strip() + '\n')
print(f"\nTranscription saved to: {output_file}")
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment