Pass Raw Video Frames to libvpx Encoder API

This article explains how to pass raw video frames into the libvpx encoder API for VP8 and VP9 video compression. It covers preparing the raw pixel data using the vpx_image_t structure, managing memory allocation, and passing the frames into the encoder using the core API functions.

To pass raw video frames into the libvpx encoder, you must format the input data into a structure the library understands. The libvpx API uses the vpx_image_t structure to represent uncompressed video frames, which are typically in a YUV color format such as YUV420p (represented as VPX_IMG_FMT_I420 in the API).

1. Allocating and Initializing the Image Structure

Before sending a frame to the encoder, you must wrap your raw pixel data in a vpx_image_t struct. There are two primary ways to do this depending on how you manage your memory:

Allocation by libvpx: If you want libvpx to manage the memory buffer, use the vpx_img_alloc function. This allocates memory for the plane buffers based on the specified color format, width, and height.
Wrapping existing memory: If you already have the raw frame in an existing memory buffer (for example, from a camera capture or a decoder), use vpx_img_wrap. This wraps your existing buffer pointers into the vpx_image_t structure without allocating new memory.

2. Populating the Plane Pointers and Strides

A raw YUV frame consists of multiple planes (Luma ‘Y’, and Chroma ‘U’ and ‘V’). You must populate the planes array and the stride array inside the vpx_image_t struct:

planes[VPX_PLANE_Y], planes[VPX_PLANE_U], and planes[VPX_PLANE_V] must point to the start of the respective pixel data for each channel.
stride[VPX_PLANE_Y], stride[VPX_PLANE_U], and stride[VPX_PLANE_V] must store the line stride (the number of bytes from the start of one row to the start of the next) for each plane.

3. Encoding the Frame

Once the vpx_image_t structure is populated, you pass it to the encoder using the vpx_codec_encode function. The key arguments for this function are:

Codec Context: A pointer to your initialized vpx_codec_ctx_t structure.
Raw Image: A pointer to your populated vpx_image_t struct. To flush the encoder at the end of a stream, pass NULL for this argument.
Presentation Timestamp (PTS): A timestamp indicating when the frame should be displayed, measured in timebase units.
Duration: How long the frame should be displayed.
Flags: Control flags, such as forcing a keyframe using VPX_E_FORCE_KEY.
Deadline / Quality: A parameter controlling the speed-to-quality trade-off (e.g., VPX_DL_GOOD_QUALITY, VPX_DL_REALTIME, or VPX_DL_BEST_QUALITY).

4. Retrieving the Compressed Packets

After calling vpx_codec_encode, the compressed data is retrieved by calling vpx_codec_get_cx_data in a loop. This function returns pointers to vpx_codec_cx_pkt_t structures containing the compressed frame data, which can then be written to a container file or transmitted over a network.