Skip to content

Instantly share code, notes, and snippets.

@cynthia2006
Last active October 6, 2025 11:59
Show Gist options
  • Select an option

  • Save cynthia2006/4ea651a74b0f09e7ea519cfa5f33c695 to your computer and use it in GitHub Desktop.

Select an option

Save cynthia2006/4ea651a74b0f09e7ea519cfa5f33c695 to your computer and use it in GitHub Desktop.
Tenets of AV1 Encoding using SVT-AV1

Warning

What has been written here earlier maybe somewhat misleading, and it is encouraged to read this primer instead.

Tenets of AV1 Encoding

AV1 is a next-generation video codec developed by Alliance of Open Media to facilitate VOD, storage and live-streaming. Usually paired with the Opus audio codec and stored in WebM or Matroska container, or even MP4 (ISOBMFF) (and streamed using HLS). As of now, besides libaom and rav1e, SVT-AV1 is currently the best production quality encoder available (the matter of discusssion here). This introductory guide is based on the SVT-AV1 documentation.

Presets

Presets (can be selected with --preset option) are a collection of predefined options that influence the speed vs. quality tradeoff, ranging from 0 to 10. Lower quality presets enable expensive algorithms or tune them for better compression, higher presets do the opposite for a given rate control (e.g. CRF or VBR). The uses cases of presets vary as per needs; if better compression is crucial (for VOD) and you have a good hardware, choose a lower preset 0-3; if compression and speed are to be balanced choose 4-6; if speed is critical (for streaming) choose 7-10.

As mentioned by others, there are also negative presets (e.g. preset -1) which may serve those who're seeking maximal effeciency, even if in some cases that might turn out to be placebo (SVT-AV1 docs themselves refer presets below zero as debugging). It's worth mentioning that, below preset 4 encoding speed starts to drop down dramatically.

This table could be consulted if you need to know exactly which options are enabled at which preset.

Rate Control

AV1 provides three means of rate control—CBR, VBR (enabled with --rc 1) or CRF (enabled with --rc 0, which is default); CRF is the most relevant, and occasionally VBR. CBR is recommended against of; don't use it, unless you have very special needs, and are aware of the direction you're steering towards.

VBR

VBR is not often used when encoding AV1, but if you need to target a certain (average) bitrate, you can choose this mode of rate-control. To ascertain a certain filesize, multi-pass encoding is often recommended; it has to be enabled using --pass 2 option, which generates a log-file in its first pass, containing information for the encoder to use in its second pass.

Option --tbr tunes the average bitrate (e.g. --tbr=2m for average 2 Mbit/s bitrate). It should be noted that, bitrate may deviate through the course of encoding, as the encoder attempts to roughly maintain an expected visual fidelity 1 at that rate.

CRF

Whereas VBR tunes the quantizer to achieve a certain bitrate, CRF2 tunes it to achieve a certain visual quality.

Quantiser

The quantiser is at the heart of lossy video compression. Largely oversimplifying, its primary job is to strip information (e.g. high frequency components of DCT) that wouldn't be perceptually noticeable to humans, thus saving bits that would otherwise be wasted. In SVT-AV1, quantiser values range from 1-61; higher is more aggressive, lower is the opposite.

Setting a CRF (aka. initial quantizer value3) is done with --crf option. As a rule of thumb, higher CRF will likely use lower bitrate; a lower CRF use higher 4. Just as with other modern video encoders (e.g. x264), CRF is a logarithmic scale (i.e. lower values use dramatically high bitrates and consume more time to encode).

A CRF of 30-35 is highly recommended as a starting point for HD video, for 4K it can be lowered (but tests must be conducted to evaluate its effectiveness). CRF may be highered if it doesn't harm the desired visual fidelity levels (e.g. for 2D animated content).

10-bit Video

People often associate 10-bit with HDR, and worry about superfluous bandwidth usage—this is a misconception. Sure 10-bit is often used to make effective use of HDR, but neither 10-bit itself doesn't imply the use of HDR, nor does it dramatically increase filesize. Many other modern video codecs (e.g. VVC, EVC) use 10-bit internally, even if the input is 8-bit so as to avoid quantisation-related effects (e.g. banding) and rounding errors. Thus, it's almost always highly recommended to use 10-bit even if the video is 8-bit.

GOP Interval

GOP Interval (tuned by the --keyint option) specifies the interval of frames after which an intra-frame (i.e. keyframe) is inserted. Frequent keyframes lends itself to precise and fast seekability, but at the cost of reduced compression efficiency. It's often recommended to use 5-10 second GOP (0.1-0.2 keyframes/second, through --keyint=5s or --keyint=10s) interval for the normal use cases, unless your needs are special. For VOD it's recommended to use frequent keyframes (1 keyframe/second, tthrough --keyint=1s) for better network error resilience, and of course, seekability.

As per SVT-AV1 recommendation, GOP length over 300 is not recommended; but you may try and see how your mileage varies.

Tuning

By default, SVT-AV1 tunes for PSNR (Peak Signal-to-Noise Ratio), a purely objective metric in assessment of video quality. This may not effect much, but using --tune 0 option (tune for visual quality) is highly recommended, as it better models human perception of quality.

Film Grain Synthesis

Films, especially celluloid films, embody in themselves a lot of film grain, which partly contributes to its old look-and-feel. This ambient optical effect is present prominently in restored films—in HD or UHD (4K). Video codecs often struggle to encode this non-deterministic noise. AV1 natively (i.e. regardless of encoder used) supports isolating and synthesizing this grain. It could be considered the video equivalent of Comfort Noise Generation by VoIP codecs like Opus.

At the encoding site, the grain is isolated and the video is denoised (if --film-grain-denoise is enabled). The characteristics of the grain is analysed, and deterministic information is encoded into the video bitstream. At the decoding site, this grain is synthesized as a post-processing step.

In SVT-AV1, film grain synthesis is disabled by default, but it can significantly aid in quality of compression if film-grain is present in video, however at the cost of greatly increased compression overhead. See the docs (and Wikipedia) for more details on this topic. As a general advice, setting the film-grain strength too high or enabling film-grain denoising (through --film-grain-denoise option) could delete fine texture detail. If you're re-encoding from a source which already has destroyed grain, then enabling this option might not help at all.

Scene Change

At present, SVT-AV1 does not insert key frames at scene changes, regardless of the scd parameter. It is, therefore, advisable to use a third-party splitting program to encode videos by chunks if key frame insertion at scene changes is desired.

The documentation further states that, AV1 is smart enough to act upon scene changes. That said, if you want maximal efficiency you should use AV1AN, which is a third-party splitting program that inserts scene changes for you.

Parallelism

SVT-AV1 is designed to be multithreaded for scalability. The --lp option (ranging from 0-6, default 4) is use to notify the encoder the level of parallelism wanted, which generally speeds up encoding at the cost of high memory usage. There are also options to tune CPU affinity (--ss) and execution pinning (--pin). Consult the docs, for a description of its exact mechanics.

For greater degrees of parallelism AV1AN may be used as well, which uses a divide-and-conquer approach of splitting video into multiple scenes, then deploying multiple instances of SVT-AV1 to encode them, and finally gathering them in one place.

See also

For more information, read the SVT-AV1 FAQ to better understand the encoder. The following is a list of topics to peer into to unleash the full potential of the codec.

  • Variance boost may help in increasing the quality of video by preserving fine details, that would otherwise be lost. However, this option is unlikely to contribute any significant quality improvements that might result with just lowering the CRF or increasing the VBR bitrate.
  • --luminance-qp-bias (ranging 0-100) option maybe used to lower CRF as per average luminosity of the current scene, ensuring greater visual fidelity for dark scenes.
  • Temporal filtering is recommended to look into, especially if your input is noisy.

For the audio part, take a look at the recommended settings for Opus.

Footnotes

  1. Deviations in quality maybe be controlled using --max-qp (maximum quantizer) and --min-qp (minimum quantizer); both parameters ranging, 1-63 (range of the quantiser itself). By default --min-qp is 1 and --max-qp is 63, which means to achieve a certain bitrate, the encoder may deviate the quantiser as it may like. However, it's not recommended to tune this values without further knowledge and experimentation.

  2. For SVT-AV1, CRF is an alias to --rc 0 --aq-mode 2 --qp <crf> option set. --aq-mode 2 means based on the CRF value, the quantizer continue to adapt.

  3. With --aq-mode 0 the quantiser value doesn't adapt, and it's then known as the CQP mode. Some claim this increases quality in scenes with high-degrees of motion, but that is largely debatable.

  4. A soft upper-bound can be set on the bitrate using the --mbr, but it has been pointed out to be largely unreliable.

@nekotrix
Copy link

nekotrix commented Jun 22, 2025

Hey, just a heads-up on some information present here:

  • AV1 is a video format, not a codec. I know this is a common misconception, as people keep using the terms interchangeably, but AV1, H.264, H.265 are formats, SVT-AV1, x264, x265 are codecs, that is to say encoders (and sometimes decoders) implementations of their respective standards.
  • AV1 isn't usually stored in a MP4 container as much as most relevant modern containers support AV1. Plus, that's a disservice of what AV1 strives for. AV1 is an open, royalty-free format, whereas MP4 is patent encumbered and neither open nor free. The WebM project, (which is based on the Matroska container), is a royalty-free file format which was released during the VP8 era and also ended up supporting VP9 and AV1. It also supports Vorbis, Opus and the WebVTT subtitle format.
  • You do not mention the existence of preset -1 which has its usecase and clearly isn't just some stuff you can ignore. Independent of its associated performance, it is the most efficient SVT-AV1 preset after all. As the table you linked shows, preset 11 through 13 don't exist anymore (since v2.3.0 in fact). Lastly, it is not true that the slow presets don't parallelize well/enough. Or at least it shows you haven't got enough hands-on experience with the encoder. Parallelization is hardly the limiting factor for speed when you use slow presets, your CPU is getting hammered anyway! The high core count thing is a completely different matter. Indeed, the SVT-AV1 documentation mentions that the encoder struggles to fully saturate a CPU above 16 cores (though it is unclear if they mean physical cores or logical threads). That is something you should probably be mentioning, however, solutions exists like running multiple encodes in parallel or making use of chunking tools like Av1an. In any case, it is rather misleading to say that the slower presets are performing under expectations without this context.
  • Again, it might be worth to mix in a bit of your experience with the information you're sharing. The MBR parameter is very inconsistent and often won't be respected by the encoder. It is not a hard cap and you should make it clear to your readers.
  • For the Grain Synthesis section, you should clarify the denoising is only performed if the associated parameter is turned on. You should also make it clear that as fast as the implementation in SVT-AV1 is, the denoiser itself is very inaccurate and leads to excessive detail loss and blurring. Other external denoising methods should be preferred when it is necessary. Also, it is not recommended to straigth up disable film grain synthesis because it acts as an efficient ditherer which can fix gradients on many video players that do not include/enable such ditherer by default.
  • I'm unsure the target audience will have a clue of what you said in the variance boost section. It might be preferable to simply link the related variance boost documentation of the SVT-AV1 repository, which is very self-explanatory by itself and difficult to top. I also completely disagree with your interpretation of the strength parameter: you completely disregard the implications on efficiency that entails. I wouldn't recommend straying from the defaults of strength 2, octile 8, but if you have to, please be conservative about it. It is extremely likely that adjusting CRF down rather than increasing the strength will lead to the same filesize but higher quality visuals.
  • Superres is simply broken, don't use it. It didn't live up to the performance and efficiency claims the feature was supposed to bring, so much so that AV2 is set to remove support for superres when it'll release (the AVM development ground already removed it iirc). Lookahead is set to the maximum of 120 by default and will automatically adjust depending on source factors. Touching that parameter is risking unpredicable encoder behavior, as my testing proved in the past, which is the last thing you'd want, introduce issues that weren't there in the first place.

The rest is usually correct and rather well explained. You made good efforts in trying to stay simple, but I think this can be largely improved. You don't talk about many other important SVT-AV1 features that are relevant for newcomers, like luma bias, temporal filtering strength (and temporal filtering at large), the --tune parameter,... Your first attempt is a bit awkward, but it is clear you did it with good intentions and genuine care for the encoder and format.

Also, though it might not be entirely relevant to the subject, I invite you and your readers to take a look at this gist page regarding general encoding knowledge that's crucial to know about: https://gist.github.com/arch1t3cht/b5b9552633567fa7658deee5aec60453.

@cynthia2006
Copy link
Author

@nekotrix Thanks for your input, and also for taking time to show what can be improved. I realise my wording or logic overall was a bit awkward, as I initially wrote it for myself, later decided to publish this document so that others can benefit and suggest improvements, while correcting me if I'm wrong. This document is still WIP, and I'm still tweaking settings and observing quality gains and filesize reductions. The following is slew of replies to your points, in order.

  1. To be even more pedantic: AV1 is a spec for video coding, that conincidentally provides a description for a format; not all specs provide a format (e.g. JPEG spec). The so called .jpg file is actually a JFIF file described in the JFIF spec.
  2. While it's true that WebM is supposed to be de facto container for AV1, YouTube (major VOD supplier) frequently uses AV1 in MP4—I had this in mind, when stating that. Maybe the fact that, AVIF (AV1 in HEIF) is based on ISOBMFF as well, affected my subconcious.
  3. Ah, I had no knowledge preset -1's existence. However, looking at the documentation it states that *presets < 0 are for debugging. What am I supposed to make of that? My intuition says, it's for people who develop SVT-AV1 itself, not for users. And yes, I was skeptical about slow presets not parallelizing well; it's not an off-or-on switch, afterall. And, those who are for maximal efficiency, won't care about parallelisation. I said that, in the vein that SVT ("Scale Video Technology") should "scale" well with parallellism. I wanted to mention AVA1N there, but for some reason didn't end up doing. In my tests, Preset 4 is signficantly slower than Preset 6 (on a 8 core Intel CPU).
  4. Yes, I kind of implied that --mbr is a soft upper bound. I at first, hadn't desired to include it at all, but did for the sake of completeness.
  5. I'm infact experimenting to augment what was said earlier in this document, because just summarising few bits of information wasn't my intended goal; the goal is using real (movie) footage to demonstrate potential gains and losses. However, I don't have access to buying BluRays to do an accurate comparison; so I resort to getting the highest quality rip on the net, and take it as reference—not accurate, but it's still adequate comparison.
  6. I did mention that, you need to use --film-grain-denoise to enable denoising, but the wording was hard to understand. As I said, it's not a stable document so expect changes. I did mention it deletes fine texture and details, but sort of implied it for denoise. In my experiments, film grain synthesis causes a signficant performance drop and no perceptible gain. Maybe I'm doing it wrong, or the sources I'm using have destroyed the film-grain already.
  7. Yes, the wording of Variance boost section was too technical to understand; not any Tom-Dick-Harry knows what superblocks are, or what block partitioning is. I thought of moving it to pointers as well, or perhaps explain it briefly, because it's too minute a detail to care about. I thought it would be quite significant, while testing it doesn't seem that way.
  8. I also regarded super resolution as too ambitious, which why there's only brief mention, not an entire section dedicated to it. And it's true that, I wasn't a fan of tuning lookahead; it's just AV1AN docs said it's worth tuning.
  9. Someone pointed me to --luminance-qp-bias on Reddit, but I didn't find any authoritative information about it apart from some vague information.
  10. I wanted to cover temporal filtering (a feature inherited from Daala), but I wasn't sure what it actually does.

@cynthia2006
Copy link
Author

There is a lot to refactor here, because it's meant to be a FFmpeg centric guide, not just what SVT-AV1 options should you use. It should include a discussion of 8-bit vs 10-bit. But I can't assess the quality of HDR myself, as I don't have the equipment. As a consequence, I want all of this information to be merged into the Trac page of FFmpeg; so it's not located in the obscure corner, that is this Github Gist.

@nekotrix
Copy link

nekotrix commented Jun 23, 2025

Hello. I continue having some concerns I'd like to share constructively. I understand you want to do things your way, and it's great to have ambitions, but it's also important to realize your current level and plan things ahead accordingly. Given that FFmpeg documentation serves as a reference point for countless users worldwide, any changes there carry significant responsibility and impact. From our interactions, let's say I'd be worried if someone with allegedly barely any hands-on experience with the encoder and no clear connection to the community/industry either, shaped up a documentation that could influence industry practices at that scale. I'm not trying to discourage you, but rather suggesting that building a stronger foundation first might help you achieve the level of impact you're aiming for more effectively.

Anyway, reality check aside, I'm happy to help where I can. Here are my thoughts to the points I have a clear answer to provide:

  1. I'm aware that YouTube serves AV1 in a MP4 container, and I have yet to grasp why, since their VP9 streams are WebM. It's not like they seem to have metadata that would prevent them from doing so. It is what it is, but that doesn't mean it's the norm nor that it is desirable. YouTube's AV1 streams are the worst representation of the format on the WEB, with terrible quality relative to the other streams of the same platform and not even that good efficiency as a result...
  2. Many users of the active AV1 communities would instantly disagree with you 🙂 . Preset -1 does serve as a reference for development sure, but it also does provide efficiency gains and is perfectly usable. You shouldn't underestimate the user's willingness to sacrifice speed for efficiency: the developers of the SVT-AV1-PSY fork created a preset -2 and preset -3 which are even more insanely slow presets that provide negligible but still observable gains in efficiency, and people have used them for certain purposes. I invite you to look up my recent SVT-AV1 deep dive on the codec wiki blog for benchmark graphs and numbers. There is no reason to ignore preset -1 in a such a guide.
  3. I was implying it is confusing. You do say...

At the encoding site, the grain is isolated and the video is denoised (using a Weiner filter).

...as if that is always the case, and only mentions the denoising parameter at the end of the next paragraph. If your experiments show that FGS is not worth the performance drop, I'm afraid you're either using too fast of a preset for it (the devs recommend using preset 6 or slower), you didn't use a strength appropriate for the given source, the tested source(s) don't have much grain or problematic gradients (doubt) or you don't know what you should be looking at. I want to clarify it's not my intention to be rude, but with years of experience with SVT-AV1 and AV1 at large, I can't get along with someone who claims that what's arguably the most defining feature of AV1 is useless, to exaggerate things a bit. Especially when there exist a performance-free alternative in the name of photon noise tables, which are static grain layers that SVT-AV1 support and don't impact performance at all due to their static nature. There are many similarities yet many differences between film grain synthesis and photon noise tables, I invite you to research the subject further.

  1. Then, again, consider pointing people directly to the varboost documentation as the present pictures are already the best at illustrating the effects of the feature.
  2. Beware of years' old guides or recommendations. AV1 has matured by now, but the development of its tools and implementations was going too fast for any knowledge to stay relevant for more than a few months. The Av1an documentation especially is largely outdated or, worse, can be remnants of misconceptions that were only debunked much later. I invite you to join some active AV1 communities to at least stay updated on the latest practices.
  3. Look forward to Part 2 of my article for explanations, benchmarks and visual comparisons of the feature at various levels. It's a dumb QP offset proportionate to the average brightness of a given frame, the most basic way to implement a luma bias feature.
  4. Well, again, considering the ambitions you seem to have, that's problematic, though it can be said you don't need to know how it works exactly if you understand the effect it has on encodes. Again, I'll explain the feature (and the strength parameter) a bit more in my upcoming article.
  5. I'm unsure why you're implying 10-bit is an HDR-only thing. 10-bit is key to ensure smooth gradients even on 8-bit SDR sources. This is a well-documented topic and most encoding communities have internalized the fact that 8-bit should never be used as long as hardware decoding support is not an issue (that is to say never an issue with AV1 as 10-bit support is mandatory to respect the main profile). 10-bit is set to become the default bit-depth in AV2 and for most upcoming formats going forward due to the clear efficiency and perceptual improvements it brings.

I think there's tremendous value in collaborating with the existing community rather than trying to reinvent the wheel in isolation. The encoding communities can have their downsides, but it can't be ignored they do have accumulated a wealth of knowledge and experience that could really accelerate your learning and help you achieve your goals more effectively. We'd be happy to see an enthusiast like you join the AV1 communities and maybe contribute to the many tools and guides we've helped spearhead, like Av1an, the numerous CRF boosting scripts, automation scripts, filtering attempts, metrics advancements, psychovisual efforts and the codec wiki at large. Your interest for this field is clear, and I think channeling that energy into collaborative efforts could be both more impactful and more rewarding than going it alone.

All right, let's keep on moving forward!

@cynthia2006
Copy link
Author

@nekotrix I wasn't meaning the official FFmpeg documentation, which is authoritative; it's the little wiki they have there. I don't have a formal experience in video-coding technologies so as to have a strong opinion on anything backed by academic rigor. There was even a time, that if I could in future, I may become a codec developer myself.

  1. Initially, I wasn't aware of the outdated state of documentation; plus I haven't found anything about preset -1 within SVT-AV1 docs.
  2. I agree, I don't have signficant experience in encoding video in SVT-AV1, partly due to lack of good hardware to do so. As I've stated earlier, this was originally meant to be personal note, but I thought sharing it with others may help. And yes, I was aware of photon noise (mentioned in AV1AN docs). I wasn't calling film-grain useless (which it isn't).
  3. I rarely ever consulted AV1AN. Most of the information I derived from SVT-AV1 (and read Wikipedia before it).
  4. Yes, I've seen your article with objective metrics for various options.
  5. There might be a misunderstanding, because I wasn't implying anything. 10-bit is distinct from HDR (with its own colourspaces, transfer-characteristics, etc.), all of that I know. And yes, I'm aware that 10-bit recommended to use, even if the source is 8-bit for better internal representation within AV1 (wanted to mention that as well).

I think there's tremendous value in collaborating with the existing community rather than trying to reinvent the wheel in isolation.
Yes, that's what I've been thinking lately. As I'm not a codec expert myself, so I'd like to give-and-take effective recommendations.

And no, you've not sounded rude. I accept criticism, because I want to know where I'm wrong, and what should I've done instead. However, the style of this guide is largely meant to be introductory in nature—not something very ambitious. Of course, if someone desires to improve upon the basic but signficant improvements, should they seek to a more advanced guide with feasible statistics.

@nekotrix
Copy link

TBH I was thinking of that wiki too when I said what said, but still, I'm sorry for having assumed stuff. A lack of explanation or details can be confusing or misleading, I hope to have succeeded in expressing this.

@cynthia2006
Copy link
Author

I've tried to be as clear and concise as I can from my (albeit not adequate) experience, avoiding spurious claims. I'm not planning this document to be a self-contained source of information there is to use the full-potential of AV1, as that would defeat purpose of my goal of being simple; and of course, I have other interests and hobbies than just video encoding. However, it's also true, that I was fiddling with encoders even from as early as ten-years as an ocassional fun hobby. However, it's until recently have I begun to take things seriously and do things as effectively possible.

I've also realised that for a person like me, who's largely a consumer than a producer, the use of AV1 for me is limited. I don't have a VOD service running, nor do I publish movie rips online, and wherever I publish videos (on YouTube mostly) they'd re-encode it their own way. So, the codec that's most effective for me is H.264, which is also the only codec I have HW accelartion support for. The only use-case I think of is to use AV1 for reducing the file sizes of the video files, where quality doesn't matter for me.

In conclusion, if I had access to raw materials (e.g. BluRays), purpose and good hardware, naturally would I have sought to understand the most effective options, thus gaining practical experience. And experience can't be subsituted for anything. I may add or remove bits of information as I gain experience or learn things.

P.S. I had done a large scale experiment weeks ago (with ~42600 samples) to measure reductions of JPEG/PNG to JXL. The total size of all files were 137 GiB, which was reduced to just 22 GiB.

@nekotrix
Copy link

Filesize reductions alone mean nothing if not associated with a measurement of sort, be it with metrics or visual comparisons.

@cynthia2006
Copy link
Author

@nekotrix Maybe because MPEG-DASH uses MP4 segments, it becomes natrual for YouTube to mux AV1 in MP4; more so because the container supports that codec format. MPEG-DASH is a formal ISO standard, whereas WebM DASH (based on exisiting MPEG-DASH) is just a document, and from what it appears, it doesn't support AV1.

To tell a more peculiar case, I've even seen DASH manifests (not on YouTube) where — in one adaptation set — there's AV1, and in another there's AAC.

@nekotrix
Copy link

nekotrix commented Oct 6, 2025

Absolutely. I was able to study this slightly during the summer, and that's the likely reason. However, it doesn't mean MP4 should be the preferred container for an end-user (which has no use for Dash) like was written in previous iterations of this gist.

BTW, WebM has supported AV1 for years, so I would be surprised WebM DASH wouldn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment