How libvpx Compresses Motion Vectors
Video compression relies heavily on temporal redundancy to reduce file sizes, and motion vectors are essential for tracking moving objects across frames. However, transmitting the raw coordinates of these vectors requires a significant amount of bandwidth. In the libvpx library—which houses the VP8 and VP9 video codecs—motion vector compression is achieved through predictive coding, context-adaptive entropy coding, and sub-pixel interpolation. This article explains the key mechanisms libvpx uses to compress motion vectors and minimize data transmission.
Motion Vector Prediction (Delta Coding)
Because objects in a video frame usually move together, neighboring blocks of pixels tend to have highly correlated motion. libvpx exploits this spatial correlation through motion vector prediction.
Instead of encoding the absolute coordinates of a motion vector, the encoder predicts the motion vector of the current block using the vectors of adjacent blocks (typically to the left, above, and diagonally). It then calculates the difference, known as the “motion vector residual” or “delta.” Because the prediction is usually highly accurate, the residual is often zero or a very small number, which requires significantly fewer bits to encode than the full coordinate.
Advanced Reference Selection and VP9 Candidates
In VP9 (the more advanced codec in libvpx), motion vector prediction is further refined. The encoder maintains a list of “candidate” motion vectors from both spatial neighbors and temporally co-located blocks in previously decoded frames.
VP9 categorizes these candidates as “Near” and “Nearest” matches: * Nearest Neighbor: The most likely motion vector based on immediate spatial neighbors. * Near Neighbor: The second most likely motion vector.
If the current block’s motion matches one of these candidates exactly, libvpx simply sends a tiny index code (a few bits) pointing to the candidate list, bypassing the need to transmit any coordinate residuals at all.
Entropy Coding of Residuals
Once the motion vector residuals are calculated, they are compressed using a boolean entropy coder. This mathematical encoder assigns shorter binary codes to values that occur frequently and longer codes to rare values.
Since the delta values are heavily clustered around zero, libvpx uses custom-tailored probability models to encode them. The library dynamically updates these probabilities based on the statistical distribution of motion vectors in the current frame, ensuring the entropy coder operates at maximum efficiency.
Fractional Pixel Precision and Sub-Pel Filters
libvpx supports sub-pixel (fractional) motion estimation, allowing for quarter-pixel precision. While higher precision improves video quality, it increases the range of potential motion vector values, which can inflate the bitrate.
To counteract this, libvpx compresses sub-pixel offsets by signaling the interpolation filter type (such as Eight-tap, Bilinear, or Smooth) at the frame level or block level. By dynamically selecting the optimal filter and signaling it efficiently, libvpx avoids wasting bandwidth on highly precise but redundant motion vector coordinates when simpler pixel alignments will suffice.