4.5.3. Memory-to-memory Stateless Video Decoder Interface¶
A stateless decoder is a decoder that works without retaining any kind of state between processed frames. This means that each frame is decoded independently of any previous and future frames, and that the client is responsible for maintaining the decoding state and providing it to the decoder with each decoding request. This is in contrast to the stateful video decoder interface, where the hardware and driver maintain the decoding state and all the client has to do is to provide the raw encoded stream and dequeue decoded frames in display order.
This section describes how user-space (“the client”) is expected to communicate with stateless decoders in order to successfully decode an encoded stream. Compared to stateful codecs, the decoder/client sequence is simpler, but the cost of this simplicity is extra complexity in the client which is responsible for maintaining a consistent decoding state.
Stateless decoders make use of the Request API. A stateless
decoder must expose the V4L2_BUF_CAP_SUPPORTS_REQUESTS
capability on its
OUTPUT
queue when VIDIOC_REQBUFS()
or VIDIOC_CREATE_BUFS()
are invoked.
Depending on the encoded formats supported by the decoder, a single decoded
frame may be the result of several decode requests (for instance, H.264 streams
with multiple slices per frame). Decoders that support such formats must also
expose the V4L2_BUF_CAP_SUPPORTS_M2M_HOLD_CAPTURE_BUF
capability on their
OUTPUT
queue.
4.5.3.1. Querying capabilities¶
- To enumerate the set of coded formats supported by the decoder, the client
calls
VIDIOC_ENUM_FMT()
on theOUTPUT
queue.- The driver must always return the full set of supported
OUTPUT
formats, irrespective of the format currently set on theCAPTURE
queue. - Simultaneously, the driver must restrain the set of values returned by codec-specific capability controls (such as H.264 profiles) to the set actually supported by the hardware.
- The driver must always return the full set of supported
- To enumerate the set of supported raw formats, the client calls
VIDIOC_ENUM_FMT()
on theCAPTURE
queue.- The driver must return only the formats supported for the format currently
active on the
OUTPUT
queue. - Depending on the currently set
OUTPUT
format, the set of supported raw formats may depend on the value of some codec-dependent controls. The client is responsible for making sure that these controls are set before querying theCAPTURE
queue. Failure to do so will result in the default values for these controls being used, and a returned set of formats that may not be usable for the media the client is trying to decode.
- The driver must return only the formats supported for the format currently
active on the
- The client may use
VIDIOC_ENUM_FRAMESIZES()
to detect supported resolutions for a given format, passing desired pixel format inv4l2_frmsizeenum
’spixel_format
. - Supported profiles and levels for the current
OUTPUT
format, if applicable, may be queried using their respective controls viaVIDIOC_QUERYCTRL()
.
4.5.3.2. Initialization¶
Set the coded format on the
OUTPUT
queue viaVIDIOC_S_FMT()
.Required fields:
type
a
V4L2_BUF_TYPE_*
enum appropriate forOUTPUT
.pixelformat
a coded pixel format.
width
,height
coded width and height parsed from the stream.
- other fields
follow standard semantics.
Note
Changing the
OUTPUT
format may change the currently setCAPTURE
format. The driver will derive a newCAPTURE
format from theOUTPUT
format being set, including resolution, colorimetry parameters, etc. If the client needs a specificCAPTURE
format, it must adjust it afterwards.Call
VIDIOC_S_EXT_CTRLS()
to set all the controls (parsed headers, etc.) required by theOUTPUT
format to enumerate theCAPTURE
formats.Call
VIDIOC_G_FMT()
forCAPTURE
queue to get the format for the destination buffers parsed/decoded from the bytestream.Required fields:
type
a
V4L2_BUF_TYPE_*
enum appropriate forCAPTURE
.
Returned fields:
width
,height
frame buffer resolution for the decoded frames.
pixelformat
pixel format for decoded frames.
num_planes
(for _MPLANEtype
only)number of planes for pixelformat.
sizeimage
,bytesperline
as per standard semantics; matching frame buffer format.
Note
The value of
pixelformat
may be any pixel format supported for theOUTPUT
format, based on the hardware capabilities. It is suggested that the driver chooses the preferred/optimal format for the current configuration. For example, a YUV format may be preferred over an RGB format, if an additional conversion step would be required for RGB.[optional] Enumerate
CAPTURE
formats viaVIDIOC_ENUM_FMT()
on theCAPTURE
queue. The client may use this ioctl to discover which alternative raw formats are supported for the currentOUTPUT
format and select one of them viaVIDIOC_S_FMT()
.Note
The driver will return only formats supported for the currently selected
OUTPUT
format and currently set controls, even if more formats may be supported by the decoder in general.For example, a decoder may support YUV and RGB formats for resolutions 1920x1088 and lower, but only YUV for higher resolutions (due to hardware limitations). After setting a resolution of 1920x1088 or lower as the
OUTPUT
format,VIDIOC_ENUM_FMT()
may return a set of YUV and RGB pixel formats, but after setting a resolution higher than 1920x1088, the driver will not return RGB pixel formats, since they are unsupported for this resolution.[optional] Choose a different
CAPTURE
format than suggested viaVIDIOC_S_FMT()
onCAPTURE
queue. It is possible for the client to choose a different format than selected/suggested by the driver inVIDIOC_G_FMT()
.Required fields:
type
a
V4L2_BUF_TYPE_*
enum appropriate forCAPTURE
.pixelformat
a raw pixel format.
width
,height
frame buffer resolution of the decoded stream; typically unchanged from what was returned with
VIDIOC_G_FMT()
, but it may be different if the hardware supports composition and/or scaling.
After performing this step, the client must perform step 3 again in order to obtain up-to-date information about the buffers size and layout.
Allocate source (bytestream) buffers via
VIDIOC_REQBUFS()
onOUTPUT
queue.Required fields:
count
requested number of buffers to allocate; greater than zero.
type
a
V4L2_BUF_TYPE_*
enum appropriate forOUTPUT
.memory
follows standard semantics.
Return fields:
count
actual number of buffers allocated.
If required, the driver will adjust
count
to be equal or bigger to the minimum of required number ofOUTPUT
buffers for the given format and requested count. The client must check this value after the ioctl returns to get the actual number of buffers allocated.
Allocate destination (raw format) buffers via
VIDIOC_REQBUFS()
on theCAPTURE
queue.Required fields:
count
requested number of buffers to allocate; greater than zero. The client is responsible for deducing the minimum number of buffers required for the stream to be properly decoded (taking e.g. reference frames into account) and pass an equal or bigger number.
type
a
V4L2_BUF_TYPE_*
enum appropriate forCAPTURE
.memory
follows standard semantics.
V4L2_MEMORY_USERPTR
is not supported forCAPTURE
buffers.
Return fields:
count
adjusted to allocated number of buffers, in case the codec requires more buffers than requested.
The driver must adjust count to the minimum of required number of
CAPTURE
buffers for the current format, stream configuration and requested count. The client must check this value after the ioctl returns to get the number of buffers allocated.
- Allocate requests (likely one per
OUTPUT
buffer) via MEDIA_IOC_REQUEST_ALLOC()
on the media device.
- Allocate requests (likely one per
- Start streaming on both
OUTPUT
andCAPTURE
queues via VIDIOC_STREAMON()
.
- Start streaming on both
4.5.3.3. Decoding¶
For each frame, the client is responsible for submitting at least one request to which the following is attached:
- The amount of encoded data expected by the codec for its current
configuration, as a buffer submitted to the
OUTPUT
queue. Typically, this corresponds to one frame worth of encoded data, but some formats may allow (or require) different amounts per unit. - All the metadata needed to decode the submitted encoded data, in the form of controls relevant to the format being decoded.
The amount of data and contents of the source OUTPUT
buffer, as well as the
controls that must be set on the request, depend on the active coded pixel
format and might be affected by codec-specific extended controls, as stated in
documentation of each format.
If there is a possibility that the decoded frame will require one or more
decode requests after the current one in order to be produced, then the client
must set the V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
flag on the OUTPUT
buffer. This will result in the (potentially partially) decoded CAPTURE
buffer not being made available for dequeueing, and reused for the next decode
request if the timestamp of the next OUTPUT
buffer has not changed.
A typical frame would thus be decoded using the following sequence:
Queue an
OUTPUT
buffer containing one unit of encoded bytestream data for the decoding request, usingVIDIOC_QBUF()
.Required fields:
index
index of the buffer being queued.
type
type of the buffer.
bytesused
number of bytes taken by the encoded data frame in the buffer.
flags
the
V4L2_BUF_FLAG_REQUEST_FD
flag must be set. Additionally, if we are not sure that the current decode request is the last one needed to produce a fully decoded frame, thenV4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
must also be set.request_fd
must be set to the file descriptor of the decoding request.
timestamp
must be set to a unique value per frame. This value will be propagated into the decoded frame’s buffer and can also be used to use this frame as the reference of another. If using multiple decode requests per frame, then the timestamps of all the
OUTPUT
buffers for a given frame must be identical. If the timestamp changes, then the currently heldCAPTURE
buffer will be made available for dequeuing and the current request will work on a newCAPTURE
buffer.
Set the codec-specific controls for the decoding request, using
VIDIOC_S_EXT_CTRLS()
.Required fields:
which
must be
V4L2_CTRL_WHICH_REQUEST_VAL
.request_fd
must be set to the file descriptor of the decoding request.
- other fields
other fields are set as usual when setting controls. The
controls
array must contain all the codec-specific controls required to decode a frame.
Note
It is possible to specify the controls in different invocations of
VIDIOC_S_EXT_CTRLS()
, or to overwrite a previously set control, as long asrequest_fd
andwhich
are properly set. The controls state at the moment of request submission is the one that will be considered.Note
The order in which steps 1 and 2 take place is interchangeable.
Submit the request by invoking
MEDIA_REQUEST_IOC_QUEUE()
on the request FD.If the request is submitted without an
OUTPUT
buffer, or if some of the required controls are missing from the request, thenMEDIA_REQUEST_IOC_QUEUE()
will return-ENOENT
. If more than oneOUTPUT
buffer is queued, then it will return-EINVAL
.MEDIA_REQUEST_IOC_QUEUE()
returning non-zero means that noCAPTURE
buffer will be produced for this request.
CAPTURE
buffers must not be part of the request, and are queued
independently. They are returned in decode order (i.e. the same order as coded
frames were submitted to the OUTPUT
queue).
Runtime decoding errors are signaled by the dequeued CAPTURE
buffers
carrying the V4L2_BUF_FLAG_ERROR
flag. If a decoded reference frame has an
error, then all following decoded frames that refer to it also have the
V4L2_BUF_FLAG_ERROR
flag set, although the decoder will still try to
produce (likely corrupted) frames.
4.5.3.4. Buffer management while decoding¶
Contrary to stateful decoders, a stateless decoder does not perform any kind of
buffer management: it only guarantees that dequeued CAPTURE
buffers can be
used by the client for as long as they are not queued again. “Used” here
encompasses using the buffer for compositing or display.
A dequeued capture buffer can also be used as the reference frame of another buffer.
A frame is specified as reference by converting its timestamp into nanoseconds,
and storing it into the relevant member of a codec-dependent control structure.
The v4l2_timeval_to_ns()
function must be used to perform that
conversion. The timestamp of a frame can be used to reference it as soon as all
its units of encoded data are successfully submitted to the OUTPUT
queue.
A decoded buffer containing a reference frame must not be reused as a decoding
target until all the frames referencing it have been decoded. The safest way to
achieve this is to refrain from queueing a reference buffer until all the
decoded frames referencing it have been dequeued. However, if the driver can
guarantee that buffers queued to the CAPTURE
queue are processed in queued
order, then user-space can take advantage of this guarantee and queue a
reference buffer when the following conditions are met:
- All the requests for frames affected by the reference frame have been queued, and
- A sufficient number of
CAPTURE
buffers to cover all the decoded referencing frames have been queued.
When queuing a decoding request, the driver will increase the reference count of all the resources associated with reference frames. This means that the client can e.g. close the DMABUF file descriptors of reference frame buffers if it won’t need them afterwards.
4.5.3.5. Seeking¶
In order to seek, the client just needs to submit requests using input buffers corresponding to the new stream position. It must however be aware that resolution may have changed and follow the dynamic resolution change sequence in that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS for H.264) may have changed and the client is responsible for making sure that a valid state is sent to the decoder.
The client is then free to ignore any returned CAPTURE
buffer that comes
from the pre-seek position.
4.5.3.6. Pausing¶
In order to pause, the client can just cease queuing buffers onto the OUTPUT
queue. Without source bytestream data, there is no data to process and the codec
will remain idle.
4.5.3.7. Dynamic resolution change¶
If the client detects a resolution change in the stream, it will need to perform the initialization sequence again with the new resolution:
- If the last submitted request resulted in a
CAPTURE
buffer being held by the use of theV4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
flag, then the last frame is not available on theCAPTURE
queue. In this case, aV4L2_DEC_CMD_FLUSH
command shall be sent. This will make the driver dequeue the heldCAPTURE
buffer. - Wait until all submitted requests have completed and dequeue the corresponding output buffers.
- Call
VIDIOC_STREAMOFF()
on both theOUTPUT
andCAPTURE
queues. - Free all
CAPTURE
buffers by callingVIDIOC_REQBUFS()
on theCAPTURE
queue with a buffer count of zero. - Perform the initialization sequence again (minus the allocation of
OUTPUT
buffers), with the new resolution set on theOUTPUT
queue. Note that due to resolution constraints, a different format may need to be picked on theCAPTURE
queue.
4.5.3.8. Drain¶
If the last submitted request resulted in a CAPTURE
buffer being
held by the use of the V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
flag, then the
last frame is not available on the CAPTURE
queue. In this case, a
V4L2_DEC_CMD_FLUSH
command shall be sent. This will make the driver
dequeue the held CAPTURE
buffer.
After that, in order to drain the stream on a stateless decoder, the client just needs to wait until all the submitted requests are completed.