What is the auto-alt-ref Parameter in libvpx?

This article explains the function of the auto-alt-ref parameter in the libvpx video encoder, which is used for VP8 and VP9 video compression. You will learn how this setting enables “alternate reference frames” to improve video quality and compression efficiency, how it works under the hood, and the performance trade-offs associated with enabling it.

In libvpx, the auto-alt-ref parameter controls the creation of alternate reference (alt-ref) frames. When enabled, this parameter allows the encoder to generate invisible frames that are not displayed to the viewer but are instead used purely as reference points to predict the compression of other visible frames. By using these high-quality, non-displayed reference frames, libvpx can significantly reduce the overall bitrate of a video while maintaining or improving its visual quality.

To create an alt-ref frame, the encoder uses a process called temporal filtering. It analyzes a sequence of future video frames—which requires enabling the lookahead buffer using the lag-in-frames parameter—and blends them together to remove temporary noise and grain. This resulting “clean” frame is stored in the encoder’s memory. Subsequent active frames can then reference this clean image to encode static backgrounds and slow-moving objects far more efficiently than using standard P-frames or B-frames.

The auto-alt-ref parameter typically accepts the following values: * 0 (Disabled): The encoder will not generate alternate reference frames. This reduces encoding time and CPU usage but results in lower compression efficiency and larger file sizes. * 1 (Enabled): The encoder automatically determines when to insert alt-ref frames. This is the standard setting for achieving optimal video quality and compression. * 2 (VP9 only): Enables a more aggressive, multi-layered alt-ref frame structure, which can further improve compression efficiency at the cost of additional encoding complexity.

While enabling auto-alt-ref is highly recommended for standard 2-pass encoding, it does have some trade-offs. It increases CPU utilization and encoding times due to the complex temporal filtering and lookahead analysis required. Additionally, because it relies on analyzing future frames, it introduces latency, making it unsuitable for real-time streaming or low-latency video conferencing where the lag-in-frames must be set to zero.