A ton of the videos I’ve shot from my camera and imported into my media-sync-api is rotated once clockwise 90° as landscape, causing me to have to adjust it 270° clockwise to have it be correctly oriented see example in screenshot.
The video container and photo metadata should be there for the asset(s) I need to automatically fix that for anything in my projects media. I want to not have a leftover copy when I’m done fixing them, and I don’t want the fixed file to have to be indexed again it should use the existing hashed index to overwrite it but with defensive programming so that we don’t have failures.
In fact defensive programming like this should cover other functionality we have.
So we’re focusing purely on orientation normalization for our rotated Nikon MOVs (90° CW), done in-place, no leftovers, and no “new asset” re-indexing (same asset record, same path), with defensive programming as a first-class pattern.
If your hash is truly sha256(file bytes), then any real fix (rotating pixels) changes the bytes, therefore changes the sha256.
So the only sane interpretation of “don’t re-index” is:
✅ Do not create a new asset record
✅ Do not duplicate the media in the library
✅ Keep the same project + relative path (e.g. ingest/originals/Z7V_1908.MOV)
✅ Update the existing DB row in-place (including its hash field if you store sha256)
✅ Overwrite the file atomically so you never end up half-fixed or missing
That achieves what you want UX-wise: same asset in the explorer; it just becomes “correct”.
You’re seeing the classic mismatch: player A honors rotate metadata, player B doesn’t (or honors it differently). The bulletproof fix is:
- Detect rotation metadata (90/180/270) via
ffprobe - Produce a normalized file where:
- pixels are physically rotated into the correct orientation
- rotate metadata is cleared (
rotate=0)
- Atomically replace the original
- Never touch files outside
ingest/originals(or whatever you allow) - Validate the DB record matches disk path
- Validate file exists and is readable
- Only operate if
rotation in {90,180,270} - Write to temp file in same directory (same filesystem → atomic rename)
- Verify temp output is sane (exists, size threshold, ffprobe readable)
- Rename original →
.bak - Rename temp → original
- Verify final file
- Delete
.bak - Update DB record (hash/size/dims/rotation fields)
- Invalidate any derivative artifacts (thumbs) if needed
No leftover copy unless there’s a failure — and even then you keep a .bak intentionally for recovery.
from __future__ import annotations
import json
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable, Optional
import subprocess
@dataclass(frozen=True)
class ProbeVideo:
rotation: int
width: int
height: int
codec: str
class MediaFixError(RuntimeError):
pass
def _run(cmd: list[str], *, timeout_s: int) -> subprocess.CompletedProcess[str]:
try:
return subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
check=False,
timeout=timeout_s,
)
except subprocess.TimeoutExpired as e:
raise MediaFixError(f"Command timed out after {timeout_s}s: {cmd}") from e
def ffprobe_video(path: Path) -> ProbeVideo:
cmd = [
"ffprobe",
"-v",
"error",
"-print_format",
"json",
"-show_streams",
"-select_streams",
"v:0",
str(path),
]
p = _run(cmd, timeout_s=15)
if p.returncode != 0:
raise MediaFixError(f"ffprobe failed: {p.stderr[-1500:]}")
data = json.loads(p.stdout or "{}")
streams = data.get("streams") or []
if not streams:
raise MediaFixError("ffprobe: no video streams found")
s0 = streams[0]
width = int(s0.get("width") or 0)
height = int(s0.get("height") or 0)
codec = str(s0.get("codec_name") or "")
rotation = 0
tags = (s0.get("tags") or {})
if "rotate" in tags:
try:
rotation = int(tags["rotate"]) % 360
except Exception:
rotation = 0
# Some files store rotation in side_data_list displaymatrix
for sd in (s0.get("side_data_list") or []):
if "rotation" in sd:
try:
rotation = int(sd["rotation"]) % 360
except Exception:
pass
return ProbeVideo(rotation=rotation, width=width, height=height, codec=codec)
def _vf_for_rotation(rot: int) -> str:
"""
rot is the metadata rotation (clockwise).
We apply the inverse to pixels to get upright.
"""
if rot == 90:
return "transpose=2" # rotate pixels 90 CCW
if rot == 270:
return "transpose=1" # rotate pixels 90 CW
if rot == 180:
return "transpose=2,transpose=2" # 180
raise MediaFixError(f"Unsupported rotation: {rot}")
def normalize_video_orientation_in_place(
*,
input_path: Path,
timeout_s: int = 3600,
crf: int = 18,
preset: str = "veryfast",
) -> bool:
"""
Returns True if a rewrite happened, False if no change was needed.
Produces an MP4 normalized file and replaces the original atomically.
Note: If you insist on preserving .MOV extension, you can output .mov,
but MP4 + faststart is typically better for browser playback.
"""
if not input_path.exists():
raise MediaFixError(f"Input missing: {input_path}")
probe = ffprobe_video(input_path)
rot = probe.rotation
if rot not in (90, 180, 270):
return False
vf = _vf_for_rotation(rot)
# Put temp and backup in the same directory for atomic renames
tmp = input_path.with_name(f".tmp.{input_path.name}.normalized.mp4")
bak = input_path.with_name(f".bak.{input_path.name}")
# Defensive: refuse to clobber existing temp/bak
if tmp.exists() or bak.exists():
raise MediaFixError(f"Refusing to run: temp/bak exists for {input_path.name}")
cmd = [
"ffmpeg",
"-hide_banner",
"-loglevel",
"error",
"-y",
"-noautorotate", # crucial: prevents double-rotation
"-i",
str(input_path),
"-vf",
vf,
"-map",
"0",
"-c:a",
"copy",
"-c:s",
"copy",
"-c:v",
"libx264",
"-pix_fmt",
"yuv420p",
"-crf",
str(crf),
"-preset",
preset,
"-movflags",
"+faststart",
"-metadata:s:v:0",
"rotate=0",
str(tmp),
]
p = _run(cmd, timeout_s=timeout_s)
if p.returncode != 0:
tmp.unlink(missing_ok=True)
raise MediaFixError(f"ffmpeg normalize failed: {p.stderr[-2000:]}")
# Validate output is real and probe-able
if not tmp.exists() or tmp.stat().st_size < 1024 * 1024:
tmp.unlink(missing_ok=True)
raise MediaFixError("Normalized output missing or too small; refusing replace")
# Probe the output and ensure rotation cleared
out_probe = ffprobe_video(tmp)
if out_probe.rotation != 0:
tmp.unlink(missing_ok=True)
raise MediaFixError(f"Output still has rotation={out_probe.rotation}; refusing replace")
# Atomic replace sequence
input_path.rename(bak)
try:
tmp.rename(input_path)
# final sanity check: file exists and is probeable
_ = ffprobe_video(input_path)
except Exception as e:
# rollback
if input_path.exists():
input_path.unlink(missing_ok=True)
bak.rename(input_path)
raise MediaFixError(f"Replace failed, rolled back: {e}") from e
finally:
tmp.unlink(missing_ok=True)
# Success: remove backup
bak.unlink(missing_ok=True)
return TrueYou have a DB row already (asset id / project / relative path / hash). The right move is a repair job that:
- selects all videos
- normalizes in place
- updates the existing row (same
id) - regenerates derived metadata (dims, duration, hash)
from __future__ import annotations
from pathlib import Path
from typing import Iterable
from app.media.orientation import normalize_video_orientation_in_place, ffprobe_video, MediaFixError
from app.storage.paths import project_originals_dir # whatever you already use
from app.db import db # your db layer (placeholder)
def compute_sha256(path: Path) -> str:
import hashlib
h = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(1024 * 1024), b""):
h.update(chunk)
return h.hexdigest()
def repair_project_videos_orientation(project: str, *, dry_run: bool = True) -> dict:
"""
Updates existing asset records in-place (same asset id).
Does NOT create new assets.
"""
results = {"project": project, "changed": 0, "skipped": 0, "failed": 0, "errors": []}
# Example query shape — replace with your real schema
assets = db.list_media(project=project, kind="video") # must include id, relative, abs path, hash
for a in assets:
try:
rel = a["relative"]
abs_path = project_originals_dir(project) / rel.replace("ingest/originals/", "")
if not abs_path.exists():
results["failed"] += 1
results["errors"].append({"id": a["id"], "error": "missing_on_disk", "path": str(abs_path)})
continue
probe = ffprobe_video(abs_path)
if probe.rotation not in (90, 180, 270):
results["skipped"] += 1
continue
if dry_run:
results["changed"] += 1
continue
changed = normalize_video_orientation_in_place(input_path=abs_path)
if not changed:
results["skipped"] += 1
continue
# Update row in place (same asset id)
new_hash = compute_sha256(abs_path)
new_probe = ffprobe_video(abs_path)
db.update_media(
media_id=a["id"],
fields={
"sha256": new_hash,
"width": new_probe.width,
"height": new_probe.height,
"rotation": 0,
# optionally duration/bitrate if you store them
},
)
# If thumbnails are content-addressed by sha, you must invalidate the old sha thumb.
# Easiest: delete thumbs for this media id or for old sha(s), then regenerate lazily.
db.invalidate_thumbnail(media_id=a["id"])
results["changed"] += 1
except MediaFixError as e:
results["failed"] += 1
results["errors"].append({"id": a.get("id"), "error": str(e)})
except Exception as e:
results["failed"] += 1
results["errors"].append({"id": a.get("id"), "error": f"unexpected: {e}"})
return resultsIf your thumbnails are named by sha256 (/thumbnails/{project}/{sha}.jpg), then after orientation fix the sha changes → thumb URL changes.
That’s fine as long as your “media list” returns the updated thumb_url based on the updated sha.
Don’t make this a web request first — run as a job/CLI so it can take time and be retryable.
from __future__ import annotations
import argparse
import json
from app.jobs.repair_orientation import repair_project_videos_orientation
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--project", required=True)
ap.add_argument("--apply", action="store_true", help="actually modify files")
args = ap.parse_args()
report = repair_project_videos_orientation(args.project, dry_run=(not args.apply))
print(json.dumps(report, indent=2))
if __name__ == "__main__":
main()Run:
python -m app.cli.repair --project P1-Public-Accountability
python -m app.cli.repair --project P1-Public-Accountability --applyHere’s the reusable pattern you should standardize across media-sync-api:
- validate project name, sha format, relative path constraints
- refuse symlinks / path traversal
- ensure file lives under allowed roots
- job can be re-run safely
- detects “already fixed” and skips
- uses deterministic derived paths
- write temp next to target
- fsync if you’re being hardcore
- rename swap
- verify
- cleanup backup
Always log:
- media id
- project
- relative path
- operation
- stderr tail on ffmpeg failure
One file fails → job continues. You get a report of failures, not a half-broken system.
When you normalize, do you want to keep the container as .MOV, or is it acceptable to normalize to .mp4 while keeping the same “relative path”?
- If you must keep
.MOV, we can output.mov(still H.264) — but Safari/web generally likes MP4 either way. - If you keep the same filename but change container, some tools get weird. I usually recommend:
- keep extension aligned with container you write.
If you don’t want to decide, default safe move is: write .mp4 only when original is already mp4, otherwise write .mov as output container. I can give you that variant.
Using DB schema keys are for the media record (the exact field names for id/hash/relative/project), I’ll snap the db.update_media() and invalidate_thumbnail() calls to match your codebase exactly.