Skip to content

Instantly share code, notes, and snippets.

@Cdaprod
Created February 4, 2026 16:05
Show Gist options
  • Select an option

  • Save Cdaprod/3a9552f65d25c7b07b46a240dfac1aea to your computer and use it in GitHub Desktop.

Select an option

Save Cdaprod/3a9552f65d25c7b07b46a240dfac1aea to your computer and use it in GitHub Desktop.
How to ensure proper media assets orientation.

Premise

A ton of the videos I’ve shot from my camera and imported into my media-sync-api is rotated once clockwise 90° as landscape, causing me to have to adjust it 270° clockwise to have it be correctly oriented see example in screenshot.

The video container and photo metadata should be there for the asset(s) I need to automatically fix that for anything in my projects media. I want to not have a leftover copy when I’m done fixing them, and I don’t want the fixed file to have to be indexed again it should use the existing hashed index to overwrite it but with defensive programming so that we don’t have failures.

In fact defensive programming like this should cover other functionality we have.


So we’re focusing purely on orientation normalization for our rotated Nikon MOVs (90° CW), done in-place, no leftovers, and no “new asset” re-indexing (same asset record, same path), with defensive programming as a first-class pattern.


The core constraint (hash vs “same asset”)

If your hash is truly sha256(file bytes), then any real fix (rotating pixels) changes the bytes, therefore changes the sha256.

So the only sane interpretation of “don’t re-index” is:

Do not create a new asset record
Do not duplicate the media in the library
Keep the same project + relative path (e.g. ingest/originals/Z7V_1908.MOV)
Update the existing DB row in-place (including its hash field if you store sha256)
Overwrite the file atomically so you never end up half-fixed or missing

That achieves what you want UX-wise: same asset in the explorer; it just becomes “correct”.


What you should do: normalize “rotation metadata vs pixels”

You’re seeing the classic mismatch: player A honors rotate metadata, player B doesn’t (or honors it differently). The bulletproof fix is:

  1. Detect rotation metadata (90/180/270) via ffprobe
  2. Produce a normalized file where:
    • pixels are physically rotated into the correct orientation
    • rotate metadata is cleared (rotate=0)
  3. Atomically replace the original

Implementation pattern (defensive + atomic)

Safety contract for each asset

  • Never touch files outside ingest/originals (or whatever you allow)
  • Validate the DB record matches disk path
  • Validate file exists and is readable
  • Only operate if rotation in {90,180,270}
  • Write to temp file in same directory (same filesystem → atomic rename)
  • Verify temp output is sane (exists, size threshold, ffprobe readable)
  • Rename original → .bak
  • Rename temp → original
  • Verify final file
  • Delete .bak
  • Update DB record (hash/size/dims/rotation fields)
  • Invalidate any derivative artifacts (thumbs) if needed

No leftover copy unless there’s a failure — and even then you keep a .bak intentionally for recovery.


Concrete code

/app/media/orientation.py

from __future__ import annotations

import json
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable, Optional

import subprocess


@dataclass(frozen=True)
class ProbeVideo:
    rotation: int
    width: int
    height: int
    codec: str


class MediaFixError(RuntimeError):
    pass


def _run(cmd: list[str], *, timeout_s: int) -> subprocess.CompletedProcess[str]:
    try:
        return subprocess.run(
            cmd,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            check=False,
            timeout=timeout_s,
        )
    except subprocess.TimeoutExpired as e:
        raise MediaFixError(f"Command timed out after {timeout_s}s: {cmd}") from e


def ffprobe_video(path: Path) -> ProbeVideo:
    cmd = [
        "ffprobe",
        "-v",
        "error",
        "-print_format",
        "json",
        "-show_streams",
        "-select_streams",
        "v:0",
        str(path),
    ]
    p = _run(cmd, timeout_s=15)
    if p.returncode != 0:
        raise MediaFixError(f"ffprobe failed: {p.stderr[-1500:]}")

    data = json.loads(p.stdout or "{}")
    streams = data.get("streams") or []
    if not streams:
        raise MediaFixError("ffprobe: no video streams found")

    s0 = streams[0]
    width = int(s0.get("width") or 0)
    height = int(s0.get("height") or 0)
    codec = str(s0.get("codec_name") or "")

    rotation = 0
    tags = (s0.get("tags") or {})
    if "rotate" in tags:
        try:
            rotation = int(tags["rotate"]) % 360
        except Exception:
            rotation = 0

    # Some files store rotation in side_data_list displaymatrix
    for sd in (s0.get("side_data_list") or []):
        if "rotation" in sd:
            try:
                rotation = int(sd["rotation"]) % 360
            except Exception:
                pass

    return ProbeVideo(rotation=rotation, width=width, height=height, codec=codec)


def _vf_for_rotation(rot: int) -> str:
    """
    rot is the metadata rotation (clockwise).
    We apply the inverse to pixels to get upright.
    """
    if rot == 90:
        return "transpose=2"               # rotate pixels 90 CCW
    if rot == 270:
        return "transpose=1"               # rotate pixels 90 CW
    if rot == 180:
        return "transpose=2,transpose=2"   # 180
    raise MediaFixError(f"Unsupported rotation: {rot}")


def normalize_video_orientation_in_place(
    *,
    input_path: Path,
    timeout_s: int = 3600,
    crf: int = 18,
    preset: str = "veryfast",
) -> bool:
    """
    Returns True if a rewrite happened, False if no change was needed.
    Produces an MP4 normalized file and replaces the original atomically.

    Note: If you insist on preserving .MOV extension, you can output .mov,
    but MP4 + faststart is typically better for browser playback.
    """
    if not input_path.exists():
        raise MediaFixError(f"Input missing: {input_path}")

    probe = ffprobe_video(input_path)
    rot = probe.rotation

    if rot not in (90, 180, 270):
        return False

    vf = _vf_for_rotation(rot)

    # Put temp and backup in the same directory for atomic renames
    tmp = input_path.with_name(f".tmp.{input_path.name}.normalized.mp4")
    bak = input_path.with_name(f".bak.{input_path.name}")

    # Defensive: refuse to clobber existing temp/bak
    if tmp.exists() or bak.exists():
        raise MediaFixError(f"Refusing to run: temp/bak exists for {input_path.name}")

    cmd = [
        "ffmpeg",
        "-hide_banner",
        "-loglevel",
        "error",
        "-y",
        "-noautorotate",          # crucial: prevents double-rotation
        "-i",
        str(input_path),
        "-vf",
        vf,
        "-map",
        "0",
        "-c:a",
        "copy",
        "-c:s",
        "copy",
        "-c:v",
        "libx264",
        "-pix_fmt",
        "yuv420p",
        "-crf",
        str(crf),
        "-preset",
        preset,
        "-movflags",
        "+faststart",
        "-metadata:s:v:0",
        "rotate=0",
        str(tmp),
    ]

    p = _run(cmd, timeout_s=timeout_s)
    if p.returncode != 0:
        tmp.unlink(missing_ok=True)
        raise MediaFixError(f"ffmpeg normalize failed: {p.stderr[-2000:]}")

    # Validate output is real and probe-able
    if not tmp.exists() or tmp.stat().st_size < 1024 * 1024:
        tmp.unlink(missing_ok=True)
        raise MediaFixError("Normalized output missing or too small; refusing replace")

    # Probe the output and ensure rotation cleared
    out_probe = ffprobe_video(tmp)
    if out_probe.rotation != 0:
        tmp.unlink(missing_ok=True)
        raise MediaFixError(f"Output still has rotation={out_probe.rotation}; refusing replace")

    # Atomic replace sequence
    input_path.rename(bak)
    try:
        tmp.rename(input_path)
        # final sanity check: file exists and is probeable
        _ = ffprobe_video(input_path)
    except Exception as e:
        # rollback
        if input_path.exists():
            input_path.unlink(missing_ok=True)
        bak.rename(input_path)
        raise MediaFixError(f"Replace failed, rolled back: {e}") from e
    finally:
        tmp.unlink(missing_ok=True)

    # Success: remove backup
    bak.unlink(missing_ok=True)
    return True

Wire it into your system without “re-indexing”

You have a DB row already (asset id / project / relative path / hash). The right move is a repair job that:

  • selects all videos
  • normalizes in place
  • updates the existing row (same id)
  • regenerates derived metadata (dims, duration, hash)

/app/jobs/repair_orientation.py

from __future__ import annotations

from pathlib import Path
from typing import Iterable

from app.media.orientation import normalize_video_orientation_in_place, ffprobe_video, MediaFixError
from app.storage.paths import project_originals_dir  # whatever you already use
from app.db import db  # your db layer (placeholder)


def compute_sha256(path: Path) -> str:
    import hashlib
    h = hashlib.sha256()
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()


def repair_project_videos_orientation(project: str, *, dry_run: bool = True) -> dict:
    """
    Updates existing asset records in-place (same asset id).
    Does NOT create new assets.
    """
    results = {"project": project, "changed": 0, "skipped": 0, "failed": 0, "errors": []}

    # Example query shape — replace with your real schema
    assets = db.list_media(project=project, kind="video")  # must include id, relative, abs path, hash

    for a in assets:
        try:
            rel = a["relative"]
            abs_path = project_originals_dir(project) / rel.replace("ingest/originals/", "")
            if not abs_path.exists():
                results["failed"] += 1
                results["errors"].append({"id": a["id"], "error": "missing_on_disk", "path": str(abs_path)})
                continue

            probe = ffprobe_video(abs_path)
            if probe.rotation not in (90, 180, 270):
                results["skipped"] += 1
                continue

            if dry_run:
                results["changed"] += 1
                continue

            changed = normalize_video_orientation_in_place(input_path=abs_path)
            if not changed:
                results["skipped"] += 1
                continue

            # Update row in place (same asset id)
            new_hash = compute_sha256(abs_path)
            new_probe = ffprobe_video(abs_path)

            db.update_media(
                media_id=a["id"],
                fields={
                    "sha256": new_hash,
                    "width": new_probe.width,
                    "height": new_probe.height,
                    "rotation": 0,
                    # optionally duration/bitrate if you store them
                },
            )

            # If thumbnails are content-addressed by sha, you must invalidate the old sha thumb.
            # Easiest: delete thumbs for this media id or for old sha(s), then regenerate lazily.
            db.invalidate_thumbnail(media_id=a["id"])

            results["changed"] += 1

        except MediaFixError as e:
            results["failed"] += 1
            results["errors"].append({"id": a.get("id"), "error": str(e)})
        except Exception as e:
            results["failed"] += 1
            results["errors"].append({"id": a.get("id"), "error": f"unexpected: {e}"})

    return results

Important note about your thumbnail URLs

If your thumbnails are named by sha256 (/thumbnails/{project}/{sha}.jpg), then after orientation fix the sha changes → thumb URL changes.

That’s fine as long as your “media list” returns the updated thumb_url based on the updated sha.


How to run it (CLI-first, safest)

Don’t make this a web request first — run as a job/CLI so it can take time and be retryable.

/app/cli/repair.py

from __future__ import annotations

import argparse
import json

from app.jobs.repair_orientation import repair_project_videos_orientation


def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--project", required=True)
    ap.add_argument("--apply", action="store_true", help="actually modify files")
    args = ap.parse_args()

    report = repair_project_videos_orientation(args.project, dry_run=(not args.apply))
    print(json.dumps(report, indent=2))


if __name__ == "__main__":
    main()

Run:

python -m app.cli.repair --project P1-Public-Accountability
python -m app.cli.repair --project P1-Public-Accountability --apply

Your “defensive programming should cover other functionality” (pattern)

Here’s the reusable pattern you should standardize across media-sync-api:

1) Guardrails at the boundary

  • validate project name, sha format, relative path constraints
  • refuse symlinks / path traversal
  • ensure file lives under allowed roots

2) Idempotent operations

  • job can be re-run safely
  • detects “already fixed” and skips
  • uses deterministic derived paths

3) Atomic write strategy

  • write temp next to target
  • fsync if you’re being hardcore
  • rename swap
  • verify
  • cleanup backup

4) Structured logging

Always log:

  • media id
  • project
  • relative path
  • operation
  • stderr tail on ffmpeg failure

5) “Failure isolation”

One file fails → job continues. You get a report of failures, not a half-broken system.


One decision you should make now (and I’ll align code to it)

When you normalize, do you want to keep the container as .MOV, or is it acceptable to normalize to .mp4 while keeping the same “relative path”?

  • If you must keep .MOV, we can output .mov (still H.264) — but Safari/web generally likes MP4 either way.
  • If you keep the same filename but change container, some tools get weird. I usually recommend:
    • keep extension aligned with container you write.

If you don’t want to decide, default safe move is: write .mp4 only when original is already mp4, otherwise write .mov as output container. I can give you that variant.


Optional Follow-ups

Using DB schema keys are for the media record (the exact field names for id/hash/relative/project), I’ll snap the db.update_media() and invalidate_thumbnail() calls to match your codebase exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment