Video Color Space
=================

By default, ``spdl.io`` decodes video frames to **RGB24** so they can be fed
directly into models that expect RGB tensors.  When this default is not what
you want, ``spdl.io`` can also decode into the codec's native YUV color space
or into other pixel formats.  This is useful when:

- You want to skip the YUV → RGB conversion and let your model (or downstream
  GPU code) consume YUV directly.
- You have a model trained on YUV inputs (e.g. NV12 from a hardware decoder).
- You want to reduce memory traffic.  ``yuv420p`` is roughly half the size of
  ``rgb24`` for the same resolution.

This page covers the supported pixel formats, the resulting tensor shapes,
and how to control which one ``spdl.io`` produces.

.. image:: ../_static/data/io_color_space_overview.png

The ``filter_desc`` argument
----------------------------

Color-space conversion in ``spdl.io`` is controlled by the
``filter_desc`` argument passed to :py:func:`spdl.io.load_video`,
:py:func:`spdl.io.decode_packets`, and related functions.
There are three modes:

1. **Unset (default).** ``spdl.io`` builds a filter graph that converts the
   output to ``rgb24``.  This is the easy path for models that expect RGB.

   .. code-block:: python

      import spdl.io

      # Decoded to rgb24 — shape (num_frames, H, W, 3), dtype uint8.
      buffer = spdl.io.load_video("video.mp4")

2. **Explicit pixel format.** Use :py:func:`spdl.io.get_video_filter_desc` to
   pick a different ``pix_fmt``.

   .. code-block:: python

      filter_desc = spdl.io.get_video_filter_desc(pix_fmt="yuv420p")
      buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)

3. **Disabled (``filter_desc=None``).** No filter graph is constructed and no
   pixel-format conversion happens.  The buffer keeps the codec's native pixel
   format — for most codecs this is ``yuv420p``.  This is the cheapest path
   when you do not need RGB.

   .. code-block:: python

      buffer = spdl.io.load_video("video.mp4", filter_desc=None)

Supported pixel formats
-----------------------

The following pixel formats are recognised by ``spdl.io`` when converting
decoded frames into a buffer.  See :py:func:`spdl.io.get_video_filter_desc`
for the parameter reference.

.. list-table::
   :header-rows: 1
   :widths: 18 14 18 30 20

   * - ``pix_fmt``
     - Layout
     - Channels
     - Buffer shape (per video)
     - Notes
   * - ``rgb24``
     - Interleaved
     - 3
     - ``(N, H, W, 3)``
     - Default. NHWC, ``uint8``.
   * - ``rgba``
     - Interleaved
     - 4
     - ``(N, H, W, 4)``
     - NHWC, ``uint8``. With alpha.
   * - ``gray8``
     - Interleaved
     - 1
     - ``(N, H, W, 1)``
     - Single luma channel, ``uint8``.
   * - ``yuv444p`` / ``yuvj444p``
     - Planar
     - 3
     - ``(N, 3, H, W)``
     - NCHW. Three full-resolution planes (Y, U, V).
       ``yuvj`` variants use full range (0–255); ``yuv`` uses limited range
       (16–235 for Y).
   * - ``yuv420p`` / ``yuvj420p``
     - Planar (packed)
     - 1
     - ``(N, 1, H + H/2, W)``
     - Y plane on top (``H × W``), U and V planes (each ``H/2 × W/2``)
       packed underneath.
   * - ``yuv422p`` / ``yuvj422p``
     - Planar (packed)
     - 1
     - ``(N, 1, 2H, W)``
     - Y plane on top, U and V planes (each ``H × W/2``) packed underneath.
   * - ``nv12``
     - Semi-planar (packed)
     - 1
     - ``(N, 1, H + H/2, W)``
     - Y plane on top, U/V interleaved underneath. The native format of many
       hardware decoders (e.g. NVDEC).

In every case, ``N`` is the number of decoded frames, ``H`` and ``W`` are
the frame dimensions, and the dtype is ``uint8``.

Examples
--------

The figures below illustrate each pixel format on the same source frame.  The
left panel is the canonical RGB rendering (decoded with the default
``filter_desc``); the remaining panels are the raw planes that ``spdl.io``
returns, plotted as grayscale images at their actual resolution.  The buffer
shape is shown in the title.

RGB24 (default)
~~~~~~~~~~~~~~~

Three interleaved channels stored in NHWC order — the simplest layout, and
what most PyTorch / NumPy code expects.

.. code-block:: python

   buffer = spdl.io.load_video("video.mp4")
   array = spdl.io.to_numpy(buffer)
   # array.shape == (num_frames, H, W, 3)

.. image:: ../_static/data/io_color_space_rgb24.png

RGBA
~~~~

Same as RGB24 but with an alpha channel.  Useful for sources that carry
transparency.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="rgba")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, H, W, 4)

.. image:: ../_static/data/io_color_space_rgba.png

Grayscale (``gray8``)
~~~~~~~~~~~~~~~~~~~~~

A single luma channel.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="gray8")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, H, W, 1)

.. image:: ../_static/data/io_color_space_gray8.png

YUV 4:4:4 — full chroma resolution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

True planar YUV: three independent planes at full resolution, in NCHW order.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="yuv444p")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, 3, H, W)

.. image:: ../_static/data/io_color_space_yuv444p.png

YUV 4:2:0 — codec-native chroma subsampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The native pixel format for most H.264 / H.265 / VP9 video.  The chroma
planes are subsampled by 2× in both dimensions and packed below the luma
plane in a single ``(H + H/2) × W`` buffer.  Total size is 1.5× ``H × W``,
half the size of ``rgb24``.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="yuv420p")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, 1, H + H // 2, W)

.. image:: ../_static/data/io_color_space_yuv420p.png

YUV 4:2:2 — chroma subsampled horizontally
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Chroma subsampled by 2× horizontally only.  The packed buffer is
``2H × W`` — Y on top, U and V each ``H × W/2`` packed beneath.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="yuv422p")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, 1, 2 * H, W)

.. image:: ../_static/data/io_color_space_yuv422p.png

NV12 — semi-planar YUV 4:2:0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NV12 has the same total size and Y layout as ``yuv420p`` but interleaves U
and V samples (``UVUVUV...``) in a single chroma plane.  This is the format
emitted by many hardware decoders (including NVDEC); decoding directly into
NV12 avoids a deinterleave step.

.. code-block:: python

   filter_desc = spdl.io.get_video_filter_desc(pix_fmt="nv12")
   buffer = spdl.io.load_video("video.mp4", filter_desc=filter_desc)
   # array.shape == (num_frames, 1, H + H // 2, W)

.. image:: ../_static/data/io_color_space_nv12.png

Skipping the conversion entirely
--------------------------------

Passing ``filter_desc=None`` disables the filter graph altogether.  The
returned buffer keeps whatever pixel format the decoder produces — for
software H.264 / H.265 / VP9 decoders this is almost always ``yuv420p``.

.. code-block:: python

   # No filtering, no color conversion, no scaling.
   buffer = spdl.io.load_video("video.mp4", filter_desc=None)

Use this mode when:

- You want maximum decoding throughput and do not need RGB.
- You will do color conversion later on the GPU (often cheaper than on the
  CPU).
- You want to be sure ``spdl.io`` is not silently inserting a ``scale`` or
  ``format`` filter.

When ``filter_desc`` is ``None``, other filter-graph features
(``scale_width`` / ``scale_height``, ``crop_width`` / ``crop_height``,
``num_frames`` padding, ``timestamp`` trimming, etc.) are also disabled —
those are all built on top of the filter graph.  If you need any of them,
pass an explicit ``filter_desc`` instead.

See also
--------

- :doc:`filtering` — full guide to constructing filter descriptions.
- :doc:`ffmpeg_cheatsheet` — the ``pix_fmt`` row sits in the
  ``get_video_filter_desc`` table.
- :py:func:`spdl.io.get_video_filter_desc` — Python API reference.