Pass Raw Video Frames to libvpx Encoder API
This article explains how to pass raw video frames into the libvpx
encoder API for VP8 and VP9 video compression. It covers preparing the
raw pixel data using the vpx_image_t structure, managing
memory allocation, and passing the frames into the encoder using the
core API functions.
To pass raw video frames into the libvpx encoder, you must format the
input data into a structure the library understands. The libvpx API uses
the vpx_image_t structure to represent uncompressed video
frames, which are typically in a YUV color format such as YUV420p
(represented as VPX_IMG_FMT_I420 in the API).
1. Allocating and Initializing the Image Structure
Before sending a frame to the encoder, you must wrap your raw pixel
data in a vpx_image_t struct. There are two primary ways to
do this depending on how you manage your memory:
- Allocation by libvpx: If you want libvpx to manage
the memory buffer, use the
vpx_img_allocfunction. This allocates memory for the plane buffers based on the specified color format, width, and height. - Wrapping existing memory: If you already have the
raw frame in an existing memory buffer (for example, from a camera
capture or a decoder), use
vpx_img_wrap. This wraps your existing buffer pointers into thevpx_image_tstructure without allocating new memory.
2. Populating the Plane Pointers and Strides
A raw YUV frame consists of multiple planes (Luma ‘Y’, and Chroma ‘U’
and ‘V’). You must populate the planes array and the
stride array inside the vpx_image_t
struct:
planes[VPX_PLANE_Y],planes[VPX_PLANE_U], andplanes[VPX_PLANE_V]must point to the start of the respective pixel data for each channel.stride[VPX_PLANE_Y],stride[VPX_PLANE_U], andstride[VPX_PLANE_V]must store the line stride (the number of bytes from the start of one row to the start of the next) for each plane.
3. Encoding the Frame
Once the vpx_image_t structure is populated, you pass it
to the encoder using the vpx_codec_encode function. The key
arguments for this function are:
- Codec Context: A pointer to your initialized
vpx_codec_ctx_tstructure. - Raw Image: A pointer to your populated
vpx_image_tstruct. To flush the encoder at the end of a stream, passNULLfor this argument. - Presentation Timestamp (PTS): A timestamp indicating when the frame should be displayed, measured in timebase units.
- Duration: How long the frame should be displayed.
- Flags: Control flags, such as forcing a keyframe
using
VPX_E_FORCE_KEY. - Deadline / Quality: A parameter controlling the
speed-to-quality trade-off (e.g.,
VPX_DL_GOOD_QUALITY,VPX_DL_REALTIME, orVPX_DL_BEST_QUALITY).
4. Retrieving the Compressed Packets
After calling vpx_codec_encode, the compressed data is
retrieved by calling vpx_codec_get_cx_data in a loop. This
function returns pointers to vpx_codec_cx_pkt_t structures
containing the compressed frame data, which can then be written to a
container file or transmitted over a network.