VP9 Parallel Encoding Using Libvpx Tile Columns
This article explains how the libvpx codec library
utilizes tile columns to parallelize the VP9 video encoding process. By
dividing video frames into independent vertical sections,
libvpx enables multi-threaded processing, significantly
reducing encoding times without relying solely on frame-level
parallelization.
Understanding VP9 Tiles
In traditional video coding, encoding a frame is a highly sequential process because pixels in one part of the frame often rely on previously encoded pixels nearby for spatial prediction. VP9 solves this bottleneck by introducing “tiles.”
Tiles are independent, rectangular regions within a video frame. VP9 specifically allows frames to be segmented into vertical partitions called tile columns. Because the boundaries between these columns restrict spatial prediction and entropy coding context sharing, each tile column can be encoded and decoded independently of the others.
How Libvpx Implements Tile Columns
The reference encoder libvpx leverages these vertical
partitions to distribute the encoding workload across multiple CPU
cores.
- Grid Division: The encoder divides each frame into
\(2^N\) tile columns, where \(N\) is configured using the
--tile-columnsparameter (e.g.,--tile-columns=2creates 4 tile columns). - Thread Allocation: When multi-threading is enabled
via the
--threadsparameter,libvpxassigns different threads to process these tile columns simultaneously. For example, if a frame is split into four tile columns and four threads are allocated, each thread processes one column in parallel. - Entropy Coding Independence: Each tile column contains its own independent entropy coding engine. This allows the arithmetic coder to process the bitstream of each column concurrently without waiting for the state of neighboring columns.
The Role of Loop Filtering
While prediction and entropy coding are restricted to tile boundaries
during the core encoding phase, the loop filter (which smooths block
edges) can still run across tile boundaries afterward. To maximize
parallelization, libvpx supports multi-threaded loop
filtering, allowing the post-processing phase to match the speed of the
tiled encoding phase.
Trade-offs of Tile-Based Parallelism
While utilizing tile columns dramatically improves encoding speed, it comes with a minor trade-off in compression efficiency. Because prediction is disabled across tile boundaries, the encoder cannot use pixels in an adjacent tile to predict pixels in the current tile. This restriction can result in a slight increase in bitrate (usually between 1% and 5%) to maintain the same visual quality compared to a non-tiled encode. However, for modern multi-core processors, the massive reduction in encoding time heavily outweighs this minor efficiency loss.