VP9 Intra Prediction Modes in libvpx

This article explores how VP9 handles intra-frame prediction within the libvpx library, highlighting the key differences and improvements over its predecessor, VP8. We will examine the expanded block sizes, the standard ten intra-prediction modes applied uniformly across partitions, and how the libvpx encoder optimizes these modes for better compression efficiency and performance.

Expanded Block Sizes and Uniformity

In VP8, intra-prediction was heavily constrained by block size; different prediction modes were restricted to either 4x4 or 16x16 luma blocks. VP9 changes this paradigm by introducing a highly flexible, hierarchical coding tree structure.

VP9 supports block sizes ranging from 4x4 up to 64x64 pixels. Crucially, the same ten intra-prediction modes are applied uniformly across all block sizes (4x4, 8x8, 16x16, 32x32, and 64x64). This consistency simplifies the decoder design and allows larger, flat areas of an image to be predicted efficiently using massive blocks without losing the ability to use complex directional prediction.

The Ten Intra-Frame Prediction Modes

VP9 utilizes ten distinct intra-prediction modes to reconstruct blocks using spatial correlation from previously decoded neighboring pixels (the pixels immediately to the left and above the current block):

DC Prediction: Fills the block with the average value of the neighboring top and left pixels.
Vertical (V) Prediction: Copies the pixels from the row immediately above vertically down through the block.
Horizontal (H) Prediction: Copies the pixels from the column immediately to the left horizontally across the block.
True Motion (TM) Prediction: Predicts pixels by adding the horizontal and vertical gradients relative to the top-left corner pixel.
Six Angular Modes: These modes predict textures at specific diagonal angles (Vertical-Right, Horizontal-Down, Vertical-Left, Horizontal-Up, Diagonal-Down-Left, and Diagonal-Down-Right). They allow the codec to accurately represent diagonal edges and slanted patterns.

Context-Based Mode Encoding

To minimize the data overhead of signaling which of the ten modes is used for a given block, VP9 uses context-based probability models. The libvpx encoder looks at the intra-prediction modes chosen for the adjacent top and left blocks.

Because neighboring blocks often share similar texture directions, this context allows the encoder to predict the current block’s mode. The actual mode is then encoded using entropy coding based on these dynamic probabilities, significantly reducing the bitrate required for mode representation compared to VP8.

Libvpx Implementation and Optimization

Implementing these intra-prediction modes for larger block sizes like 32x32 and 64x64 dramatically increases computational complexity. To handle this, the libvpx library implements several optimizations:

Hardware Acceleration: Libvpx includes highly optimized assembly code (utilizing SIMD instructions such as AVX2, SSE2, and ARM NEON) to calculate the ten prediction paths and their resulting residuals rapidly.
Speed Heuristics: In real-time or fast encoding configurations, libvpx employs heuristics to bypass the evaluation of all ten modes for every block. For instance, if the DC or variance of a block is low, it may skip testing complex angular modes entirely, saving CPU cycles while maintaining competitive visual quality.