Row-Based Multi-Threading in Libvpx Explained
This article explains the significance of the row-based
multi-threading (Row-MT) feature in the libvpx library, the
reference software encoder for the VP8 and VP9 video formats. We will
explore how Row-MT works, why it represents a major upgrade over
traditional threading methods, and its impact on encoding speed,
compression efficiency, and real-time video communication.
What is Row-Based Multi-Threading?
In video encoding, processing speed is heavily dependent on how well
an encoder can distribute workloads across multiple CPU cores.
Traditionally, the libvpx encoder relied on “tile-based”
multi-threading. This method splits a video frame into vertical columns
(tiles) and encodes each tile on a separate thread. However, tile-based
threading has limits: it degrades compression efficiency because threads
cannot share data across tile boundaries, and the number of threads is
strictly limited by the number of tiles.
Row-based multi-threading (Row-MT) solves these limitations by allowing the encoder to process rows of coding blocks (macroblocks or superblock rows) within a frame or tile concurrently.
The Significance of Row-MT
The introduction of Row-MT in libvpx (particularly for
VP9) brought several critical advancements to video encoding:
1. Superior CPU Scaling and Speed
With Row-MT, the encoder is no longer constrained by the number of vertical tiles. Because a video frame contains many more rows than tiles, Row-MT allows the encoder to utilize significantly more CPU cores. For example, a 1080p video has many more horizontal block rows than reasonable tile columns. By distributing these rows across available threads, Row-MT dramatically reduces encoding times on modern multi-core processors.
2. Preservation of Visual Quality and Compression Efficiency
When using tile-based threading, the encoder cannot perform intra-frame prediction across tile boundaries, which harms compression efficiency and can cause visible boundaries between tiles. Row-MT processes the frame as a cohesive unit. While a thread processes a specific row, it waits for the dependency pixels in the row above and to the right to be processed before proceeding. This synchronization preserves the encoder’s ability to reference neighboring blocks, maintaining maximum compression efficiency and visual quality.
3. Lower Latency for Real-Time Streaming
For live streaming and video conferencing (such as WebRTC applications), low latency is critical. Frame-based multi-threading (encoding multiple frames at the same time) introduces input lag because the system must buffer several frames before processing them. Row-MT parallelizes the encoding of a single frame. This allows for high-speed, multi-threaded encoding with zero frame-delay, making it ideal for real-time communication.
4. Efficient Resource Utilization on Consumer Devices
Modern consumer hardware, from smartphones to laptops, relies on
multi-core architectures. Row-MT ensures that even on high-core-count
consumer CPUs, libvpx can fully utilize the hardware during
VP9 encoding. This results in faster software encoding times and
smoother performance when hardware-accelerated VP9 encoding is
unavailable.