Understanding the vpx_image_t Structure in libvpx
This article explains the purpose and functionality of the
vpx_image_t structure within the libvpx library, the
official codec SDK for the VP8 and VP9 video formats. You will learn how
this structure serves as a descriptor for raw, uncompressed video
frames, how it manages memory layout, and its crucial role in both the
encoding and decoding pipelines.
The Core Purpose of vpx_image_t
In libvpx, vpx_image_t is the primary data structure
used to represent raw, uncompressed video frames in memory. Because
video codecs must ingest raw image data to produce compressed bitstreams
(encoding) and output raw image data from compressed bitstreams
(decoding), a standardized way to describe these frames is
necessary.
The vpx_image_t structure acts as a wrapper or
descriptor. It does not always own the underlying pixel memory; instead,
it provides the metadata required for the libvpx library to correctly
read, write, and interpret the layout of the raw pixel data.
Key Components of the Structure
To handle various video formats and memory layouts,
vpx_image_t contains several critical members:
- Format (
fmt): This field specifies the color space and chroma subsampling format of the image (defined by thevpx_img_fmt_tenum). Common formats includeVPX_IMG_FMT_I420(standard YUV 4:2:0),VPX_IMG_FMT_I422, andVPX_IMG_FMT_I444. - Dimensions (
wandh): These fields define the active width and height of the image in pixels. - Display Dimensions (
d_wandd_h): These fields define the intended display width and height. This allows the codec to handle padded frames where the coded size is larger than the actual visible area. - Planes (
planes): An array of pointers (typically up to four) pointing to the start of the pixel data for each plane. For YUV video,planes[0]points to the Y (luminance) plane,planes[1]to the U (chrominance) plane, andplanes[2]to the V plane. - Stride (
stride): An array of integers representing the “stride” or “pitch” of each plane. The stride is the number of bytes from the start of one line of pixels to the start of the next line. This is crucial because memory alignment often requires padding bytes at the end of each horizontal row.
Role in the Video Pipeline
The vpx_image_t structure is utilized in both directions
of the video processing pipeline:
1. The Encoding Process
When compressing video, you must pass raw video frames to the VP8 or
VP9 encoder. * Allocation: You can allocate memory for
a frame using vpx_img_alloc(), which automatically
allocates the required buffer space and populates the
vpx_image_t struct. Alternatively, if you already have a
raw frame in memory, you can use vpx_img_wrap() to map the
vpx_image_t descriptor to your existing buffer. *
Submission: Once populated with raw YUV data, the
vpx_image_t struct is passed directly to the
vpx_codec_encode() function.
2. The Decoding Process
When decompressing a VP8 or VP9 bitstream, the decoder reconstructs
the raw video frames. * Retrieval: After calling
vpx_codec_decode() to process compressed data, you call
vpx_codec_get_frame() in a loop. * Output:
This function returns a pointer to a vpx_image_t structure
populated by the decoder. You can then read the planes and
stride information from this structure to copy or render
the decoded YUV frame.