libvpx vs libaom: Key API Design Differences
This article provides a direct comparison of the API design
differences between libvpx (the reference library for the
VP8 and VP9 video codecs) and libaom (the reference library
for the AV1 video codec). While both libraries originate from the same
development lineage and share similar fundamental structures,
libaom introduces modern API patterns, expanded
configuration parameters, and refined controls to handle the increased
complexity of the AV1 codec. Understanding these differences is
essential for developers migrating video processing pipelines from VP9
to AV1.
Ancestry and Structural Similarity
Because libaom was originally forked from the
libvpx codebase, they share an identical structural
foundation. Both libraries utilize a unified codec interface design
based on the vpx_codec_ctx and aom_codec_ctx
structures, respectively.
In both libraries, the basic workflow follows these steps: 1.
Initialize a codec context (vpx_codec_enc_init
vs. aom_codec_enc_init). 2. Pass configuration structures
(vpx_codec_enc_cfg_t vs. aom_codec_enc_cfg_t).
3. Send raw frames for encoding using an encode call. 4. Retrieve
compressed packets via an iterator loop.
Despite this shared foundation, the APIs diverge significantly regarding configuration depth, control enumeration, and modern feature integration.
1. Prefixing and Namespace Separation
The most immediate difference is the naming convention used for functions, data types, and macros. This separation allows both libraries to be linked into the same application without namespace collisions:
- libvpx: Uses the
vpx_prefix for functions/types (e.g.,vpx_codec_encode,vpx_image_t) andVPX_orVP8E_/VP9E_for control macros. - libaom: Uses the
aom_prefix for functions/types (e.g.,aom_codec_encode,aom_image_t) andAOM_orAV1E_for control macros.
2. Expanded Control Enums (Ctrl IDs)
Both libraries use a generic control function
(vpx_codec_control and aom_codec_control) to
set or get specific encoder settings. However, the complexity of AV1
requires a vastly expanded set of controls in libaom.
- libvpx: Focuses on VP8/VP9 specific features like
CQ level, sharpness, active maps, and VP9-specific spatial/temporal
scalability layer configurations (
VP9E_SET_SVC,VP9E_SET_SVC_PARAMETERS). - libaom: Introduces highly granular controls for
AV1-specific coding tools. These include settings for film grain
synthesis (
AV1E_SET_FILM_GRAIN_TEST_VECTOR), restoration filters, loop restoration, quantization matrices, partition sizes, and extensive tile configuration controls (such asAV1E_SET_TILE_COLUMNSandAV1E_SET_TILE_ROWS).
3. Configuration Profiles and Key-Value API
In libvpx, settings are primarily configured by direct
manipulation of the vpx_codec_enc_cfg_t struct, followed by
vpx_codec_control calls for fine-tuning.
libaom retains this structural configuration but
introduces a more modern string-based key-value configuration helper
API. This makes it easier to pass command-line-style options directly to
the library programmatically without needing to map every configuration
variable to a hardcoded struct member or control ID in the host
application.
4. Multi-Threading and Tiling Controls
While both codecs support multi-threading, the APIs manage parallel processing differently due to the architectural differences between VP9 and AV1.
- libvpx: Threading is primarily managed via the
.g_threadsconfiguration parameter, alongside basic control parameters for tile columns (VP9E_SET_TILE_COLUMNS) and row-based multi-threading (VP9E_SET_ROW_MT). - libaom: Features a highly sophisticated tiling and threading API. It includes advanced controls for tile groups, explicit thread-to-tile mapping, and dedicated parameters for row-based multi-threading across both the encoder and loop restoration filters. This allows developers to fine-tune threading behavior to match specific multi-core CPU topologies.
5. Metadata and High Dynamic Range (HDR) Handling
Handling of modern video metadata is much more integrated into the
libaom API surface compared to libvpx.
- libvpx: Color space signaling is relatively basic, supporting standard color primaries, transfer characteristics, and matrix coefficients through standard configuration structs.
- libaom: Built with native support for advanced HDR metadata. The API includes explicit structures and controls for handling HDR10/HDR10+ metadata, content light levels (MaxCLL/MaxFALL), and mastering display color volume (MDCV) parameters directly within the encoding loop.