Skip to content

Instantly share code, notes, and snippets.

@cynthia2006
Last active August 27, 2025 15:30
Show Gist options
  • Select an option

  • Save cynthia2006/24d994350685f3eee1ca014ecd9f5bfb to your computer and use it in GitHub Desktop.

Select an option

Save cynthia2006/24d994350685f3eee1ca014ecd9f5bfb to your computer and use it in GitHub Desktop.

AV1 (AOMedia Video 1) is a next-generation video codec to facilitate VOD (video on demand), storage and live-streaming, as you might already know. It is usually stored in the WebM container, accompanied by Opus as the audio codec. Both are royalty-free codecs; i.e. you need not pay to use them unlike H.264 or H.265; latter of which was the reason AV1 was made, for it was encumbered in a complex web of patents. There are several AV1 encoders to choose from, such as—aomenc, rav1e, SVT-AV1. Although all provide a more-or-less same level of functionality, SVT-AV1 is notable for its speed and scalability.

This is largely a primer meant to familiarise you with some rudimentary aspects of AV1 encoding, which avoids some of the more advanced topics such as—film grain synthesis, variance boost, hierarchical levels, temporal filtering, etc. There's a dedicated community around the codec that document best practises for high-quality encodes.

Presets

Presets are a set of preconfigured encoder options deciding the speed-vs-quality tradeoff. There are 10 presets in total (technically 12); higher presets emphasise speed, and lower presets emphasise quality. The point is to strike a balance between the two; usually preset 6 is a good starting point, providing good compression ratios and moderate speeds.

Although there are negative presets (-1 and -2) aside from the usual 0-10 range, they may or may not pose any effect on the resultant video. Besides that, below preset 6 encoding speed begins to drop down exponentially. Above preset 6 does not play well with encoder parallelism (scalability), and you'll likely receive a warning from the encoder if you tune the level of parallelism.

Most of the times your job gets done with a preset range of 4-6; go lower if you want higher compression at the cost of (painfully) slow encoding speeds, and go higher if you want higher encoding speeds at the cost of lower compression efficiency. That said, the exact amount of gain (or loss) would largely depend on the video content itself.

Rate Control

Rate control is a mechanism by which an encoder controls the quantizer to meet a certain bandwidth or quality criteria. AV1 provides three modes of rate control—CQP (Constant Quantization Parameter), CRF (Constant Rate Factor) and VBR (Variable Bitrate). Although, there is another rate control CBR (Constant Bitrate), it is never used in practice, except maybe for encoder development.

CRF

Amongst the three modes of rate control CRF is the most used, as it ensures to meet a certain quality level (by tuning the quantizer) rather than a specific bitrate. Some scenes (e.g. scenes containing fine texture or action scenes) require a high bitrate at a certain quality level and resolution, while others do not; using CRF as rate control would better distribute bitrate throughout scenes.

Quantizer: The quantizer is at the heart of lossy video compression; its primary job is to strip high-frequency spatial information that go unnoticed to the human eye. This is used in many traditional block-transform based codecs, including AV1.

Similar to x264 (a H.264 encoder), in SVT-AV1 CRF is a logarithmic scale1ranging from 1 to 61. Using a lower CRF correlates to higher quality, thus higher bitrate consumption. Conversely, a higher CRF correlates to lower quality, thus lower bitrate consumptions. It is important to note that, low CRF values don't necessarily correlate to a superior visual quality. As a such a recommended value CRF value for HD video (1080p) ranges 30-35; it maybe highered or lowered as per needed.

GOP Length

GOP (Group of Pictures) is a set of frames—prediction or bidirectional—between two consecutive intraframes. GOP length is the number of frames in a GOP; i.e. interval of two consecutive intraframes. Larger GOP length results in better compression efficiency, however at the cost of fast and accurate seekability; i.e. a player having to decode the entire GOP to displaying from a frame in between causes undesired delays while seeking, and the only other solution is to seek to the nearest intraframe.

The usual choice for GOP length is 5-10 second; however that might vary on use case as for VOD it's usually chosen to be 1 second, considering better (network) error resilience and seekability.

Other Settings

Tune

By default SVT-AV1 tunes for PSNR (an objective metric), which not necessarily correlates to visual quality.

10-bit Video

10-bit video is a baseline AV1 feature, that helps in better detail preservation even if the input is 8-bit. Even if it's not enabled by default, it is highly recommended to enable it.

Multithreading

SVT-AV1's level of parallelism (not number of threads) ranges 0-6, and is by default 4. It should be noted that using too many threads to encode may actually even slow down the process instead. The appropriate tuning of this value depends on the hardware you use.

Example using FFmpeg

Most distributions of FFmpeg come with support for SVT-AV1 compiled. The following encodes a 1080p@30fps video (input) using the recommended settings. Note that, it also encodes an existing audio stream into Opus.

$ ffmpeg -i input -vcodec libsvtav1 \
				  -preset 4 \
				  -crf 35 \
				  -g 301 \
				  -pix_fmt yuv420p10le \
				  -svtav1-params tune=0 \
  			output.webm

Breakdown

  • -g 301 means to use a GOP length of 300 frames; i.e. 300/30 (fps) = 5 seconds.

  • -pix_fmt yuv420p10le tells FFmpeg to convert video frames from 8-bit to 10-bit (does nothing if they're 10-bit already).

  • -svtav1-params is used to directly pass parameters to the underlying SVT-AV1 encoder; each parameter is separated by a : delimiter.

    • tune=0 tells SVT-AV1 to tune for VQ (visual quality) not PSNR.

    Rest of the options should be pretty self-explanatory.

Footnotes

  1. A change in high CRF values are less significant compared to a change in low CRF values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment