How libvpx Performs Entropy Coding with Boolean Coder
This article explores how the libvpx codec library—the
reference implementation for the VP8 and VP9 video formats—utilizes a
Boolean arithmetic coder for entropy coding. We will examine the
mechanics of this binary coder, including how it represents
probabilities, divides the coding interval, performs renormalization,
and processes non-binary syntax elements through binarization.
The Role of the Boolean Coder in libvpx
Entropy coding is the final stage in the video compression pipeline,
responsible for losslessly compressing syntax elements such as motion
vectors, transform coefficients, and macroblock modes. Instead of using
traditional Huffman coding, libvpx uses a customized binary
arithmetic coder known as the Boolean coder.
This coder operates exclusively on binary decisions (bits, or “bins”) rather than multi-symbol alphabets. By restricting the input to boolean values (0 or 1), the coder significantly simplifies the mathematical calculations required for arithmetic coding, replacing complex divisions with fast bitwise shifts and additions.
Probability Representation and Contexts
The Boolean coder in libvpx relies on state-based
probability models to predict the likelihood of a 0 or a 1.
- 8-bit Probabilities: Probabilities are represented as 8-bit unsigned integers ranging from 1 to 255. The value represents the probability of the symbol being 0, scaled to a denominator of 256. For example, a probability value of 128 represents a 50/50 chance (\(128/256\)).
- Context Modeling: To maximize compression, the coder uses context-adaptive modeling. The probability of a bin is determined by the values of previously coded, neighboring syntax elements.
- Static vs. Dynamic Updates: Unlike some arithmetic
coders that update probability states after every single coded bit,
libvpx(particularly in VP8) often keeps probabilities static throughout a frame or updates them at designated intervals. This design choice prevents serialization bottlenecks and allows for highly parallelized decoding.
The Encoding Algorithm
The Boolean coder maintains two primary registers during the encoding
process: 1. range: Represents the width of the current
coding interval (initialized to 255). 2. low (or
value in the decoder): Represents the bottom of the current
interval.
For every bin processed, the coder performs the following steps:
1. Interval Subdivision
The coder splits the current range into two
sub-intervals based on the probability (\(P\)) of the current bin being 0. The split
point is calculated using integer math:
\[\text{split} = 1 + \left( \frac{(\text{range} - 1) \times P}{256} \right)\]
In practice, this is implemented using a fast bitwise shift:
split = 1 + (((range - 1) * P) >> 8).
2. Range and Low Update
Depending on whether the actual bin being encoded is 0 or 1, the
registers are updated: * If the bin is 0: The new
range becomes split. The value of
low remains unchanged. * If the bin is 1:
The new range becomes range - split, and
low is incremented by split.
3. Renormalization
As symbols are encoded, the range progressively shrinks.
To prevent numerical underflow and maintain precision, the registers
must be renormalized whenever the range falls below 128
(the MSB of the 8-bit register becomes 0).
During renormalization: * The range is shifted left by 1
bit (range <<= 1). * The low register is
shifted left by 1 bit (low <<= 1). * As
low shifts, bits are output to the final compressed
bitstream.
Binarization of Non-Binary Syntax Elements
Because the Boolean coder only processes binary inputs, any syntax element that can take on more than two values must undergo binarization before entropy coding.
libvpx converts multi-symbol values into a series of
binary decisions using tree structures (such as Huffman-like binary
trees) or unary coding. The encoder then passes each branch of the tree
as a individual bin to the Boolean coder, using a unique probability
context for each node of the decision tree. This allows the coder to
maintain high compression efficiency even when dealing with complex,
multi-valued data like transform coefficients.