Media Encoding

This section explains how to encode media data (audio and video) using SPDL. Encoding is the process of converting array data into compressed media formats like MP4, WAV, PNG, etc.

Note

This section covers audio and video encoding. For image encoding, use the video encoding workflow with a single frame. Providing array data of one frame with proper configuration will save the data as an image file (e.g., PNG, JPEG). See spdl.io.save_image() for a convenient wrapper.

The Encoding Process

Media encoding in SPDL follows a multi-stage process:

  1. Create Reference Frames: Reinterpret array data as frame objects without copying

  2. Filter Frames (Optional): Apply transformations like scaling or color correction

  3. Encode Frames: Compress frames into packets

  4. Mux Packets: Write packets to an output file

The following diagram illustrates this process:

flowchart LR a[Array Data] --> |create_reference_frame| f[Frame] f --> |Optional: FilterGraph| f2[Filtered Frame] f2 -.-> f f --> |Encoder| p[Packet] p --> |Muxer| file[Output File]

Creating Reference Frames

The first step in encoding is to create reference frames from your array data. This process reinterprets the contiguous array data into a format compatible with the encoding system without copying the data. The resulting frame objects hold metadata (format, dimensions, timestamps) and reference the original array’s memory, making this operation very efficient.

SPDL provides two functions for creating reference frames:

Creating Audio Frames

Use spdl.io.create_reference_audio_frame() to create audio frames from array data:

import numpy as np
import spdl.io

sample_rate = 44100
num_channels = 2
duration = 3

# Create audio data (3 seconds of stereo audio)
shape = (sample_rate * duration, num_channels)
audio_data = np.random.randint(-32768, 32767, size=shape, dtype=np.int16)

# Create audio frame
frames = spdl.io.create_reference_audio_frame(
    array=audio_data,
    sample_fmt="s16",      # 16-bit signed integer
    sample_rate=sample_rate,
    pts=0,                 # Presentation timestamp
)

For detailed parameter descriptions, see spdl.io.create_reference_audio_frame().

Creating Video Frames

Use spdl.io.create_reference_video_frame() to create video frames from array data:

import numpy as np
import spdl.io

height, width = 240, 320
frame_rate = (30000, 1001)  # ~29.97 fps
num_frames = 90

# Create video data (90 frames of RGB video)
shape = (num_frames, height, width, 3)
video_data = np.random.randint(0, 255, size=shape, dtype=np.uint8)

# Create video frames
frames = spdl.io.create_reference_video_frame(
    array=video_data,
    pix_fmt="rgb24",
    frame_rate=frame_rate,
    pts=0,
)

For detailed parameter descriptions, see spdl.io.create_reference_video_frame().

Using Encoders

Encoders compress frame data into packets using codecs like H.264, AAC, or PCM. The encoding process applies compression algorithms to reduce file size while maintaining quality. Encoders are created through a spdl.io.Muxer object and configured with parameters like bit rate, quality, and codec-specific settings.

Audio Encoding

Here’s a complete example of encoding audio to a WAV file:

import numpy as np
import spdl.io

sample_rate = 44100
duration = 3
num_channels = 2

# Create audio data
shape = (sample_rate * duration, num_channels)
audio_data = np.random.randint(-32768, 32767, size=shape, dtype=np.int16)

# Create muxer and encoder
muxer = spdl.io.Muxer("output.wav")
encoder = muxer.add_encode_stream(
    config=spdl.io.audio_encode_config(
        num_channels=num_channels,
        sample_fmt="s16",
        sample_rate=sample_rate,
    ),
    encoder="pcm_s16le",  # Optional: specify encoder
)

# Encode and write
with muxer.open():
    # Create frames
    frames = spdl.io.create_reference_audio_frame(
        array=audio_data,
        sample_fmt="s16",
        sample_rate=sample_rate,
        pts=0,
    )

    # Encode frames
    if (packets := encoder.encode(frames)) is not None:
        muxer.write(0, packets)

    # Flush encoder
    if (packets := encoder.flush()) is not None:
        muxer.write(0, packets)

Video Encoding

Here’s a complete example of encoding video to an MP4 file:

import numpy as np
import spdl.io

height, width = 240, 320
frame_rate = (30000, 1001)
duration = 3
batch_size = 32

num_frames = int(frame_rate[0] / frame_rate[1] * duration)
shape = (num_frames, height, width, 3)
video_data = np.random.randint(0, 255, size=shape, dtype=np.uint8)

# Create muxer and encoder
muxer = spdl.io.Muxer("output.mp4")
encoder = muxer.add_encode_stream(
    config=spdl.io.video_encode_config(
        height=height,
        width=width,
        pix_fmt="rgb24",
        frame_rate=frame_rate,
    ),
)

# Encode and write in batches
with muxer.open():
    for start in range(0, num_frames, batch_size):
        # Create frames for this batch
        frames = spdl.io.create_reference_video_frame(
            array=video_data[start:start + batch_size, ...],
            pix_fmt="rgb24",
            frame_rate=frame_rate,
            pts=start,
        )

        # Encode frames
        if (packets := encoder.encode(frames)) is not None:
            muxer.write(0, packets)

    # Flush encoder
    if (packets := encoder.flush()) is not None:
        muxer.write(0, packets)

Using the Muxer

The muxer is the final stage that writes encoded packets to an output file. It handles the container format (e.g., MP4, WAV, MKV) and ensures packets are properly interleaved and timestamped. The muxer can write multiple streams (audio, video, subtitles) into a single file.

Basic Usage

import spdl.io

# Create muxer for output file
muxer = spdl.io.Muxer("output.mp4")

# Add encoding stream(s)
encoder = muxer.add_encode_stream(
    config=spdl.io.video_encode_config(
        height=240,
        width=320,
        pix_fmt="rgb24",
        frame_rate=(30, 1),
    ),
)

# Open muxer and write data
with muxer.open():
    # ... encode and write packets
    muxer.write(0, packets)

The muxer automatically flushes and closes when used as a context manager.

Multiple Streams (Audio + Video)

You can write both audio and video streams to a single file:

import numpy as np
import spdl.io

# Create audio and video data
audio_data = np.random.randint(-32768, 32767, size=(44100 * 3, 2), dtype=np.int16)
video_data = np.random.randint(0, 255, size=(90, 240, 320, 3), dtype=np.uint8)

# Create muxer with both audio and video streams
muxer = spdl.io.Muxer("output.mp4")
audio_encoder = muxer.add_encode_stream(
    config=spdl.io.audio_encode_config(
        num_channels=2, sample_rate=44100, sample_fmt="s16"
    ),
    encoder="aac",
)
video_encoder = muxer.add_encode_stream(
    config=spdl.io.video_encode_config(
        height=240, width=320, pix_fmt="rgb24", frame_rate=(30, 1)
    ),
)

with muxer.open():
    # Write audio to stream 0
    audio_frames = spdl.io.create_reference_audio_frame(
        array=audio_data, sample_fmt="s16", sample_rate=44100, pts=0
    )
    if (packets := audio_encoder.encode(audio_frames)) is not None:
        muxer.write(0, packets)

    # Write video to stream 1
    video_frames = spdl.io.create_reference_video_frame(
        array=video_data, pix_fmt="rgb24", frame_rate=(30, 1), pts=0
    )
    if (packets := video_encoder.encode(video_frames)) is not None:
        muxer.write(1, packets)

    # Flush both encoders
    if (packets := audio_encoder.flush()) is not None:
        muxer.write(0, packets)
    if (packets := video_encoder.flush()) is not None:
        muxer.write(1, packets)

Remuxing (Copying Streams)

You can also remux (copy) streams without re-encoding:

import spdl.io

# Open source file
demuxer = spdl.io.Demuxer("input.mp4")

# Create output muxer
muxer = spdl.io.Muxer("output.mp4")
muxer.add_remux_stream(demuxer.video_codec)

# Copy packets
with muxer.open():
    for packets in demuxer.streaming_demux(duration=1):
        muxer.write(0, packets)

Customizing Encoders

SPDL provides configuration functions to customize encoding behavior.

Video Encode Configuration

Use spdl.io.video_encode_config() to customize video encoding:

import spdl.io

config = spdl.io.video_encode_config(
    height=1080,
    width=1920,
    pix_fmt="yuv420p",
    frame_rate=(30, 1),
    bit_rate=5000000,           # 5 Mbps
    gop_size=30,                # GOP size
    max_b_frames=2,             # Max B-frames
    compression_level=5,        # Compression level
    colorspace="bt709",         # Color space
    color_primaries="bt709",    # Color primaries
    color_trc="bt709",          # Transfer characteristics
)

muxer = spdl.io.Muxer("output.mp4")
encoder = muxer.add_encode_stream(config=config)

For detailed parameter descriptions, see spdl.io.video_encode_config().

Audio Encode Configuration

Use spdl.io.audio_encode_config() to customize audio encoding:

import spdl.io

config = spdl.io.audio_encode_config(
    num_channels=2,
    sample_rate=48000,
    sample_fmt="fltp",
    bit_rate=192000,        # 192 kbps
    compression_level=5,
)

muxer = spdl.io.Muxer("output.aac")
encoder = muxer.add_encode_stream(config=config)

For detailed parameter descriptions, see spdl.io.audio_encode_config().

Applying Filters to Reference Frames

You can apply filters to reference frames before encoding using spdl.io.FilterGraph. This is useful for preprocessing (e.g., scaling, color correction, audio normalization).

import numpy as np
import spdl.io

height, width = 240, 320
frame_rate = (30, 1)
video_data = np.random.randint(0, 255, size=(90, height, width, 3), dtype=np.uint8)

# Create reference frames
frames = spdl.io.create_reference_video_frame(
    array=video_data, pix_fmt="rgb24", frame_rate=frame_rate, pts=0
)

# Apply scaling filter
filter_desc = (
    f"buffer=width={width}:height={height}:pix_fmt=rgb24:"
    f"time_base={frame_rate[1]}/{frame_rate[0]}:sar=1/1,"
    "scale=640:480,buffersink"
)
filter_graph = spdl.io.FilterGraph(filter_desc)
filter_graph.add_frames(frames)
filter_graph.flush()
filtered_frames = filter_graph.get_frames()

# Encode filtered frames
muxer = spdl.io.Muxer("output.mp4")
encoder = muxer.add_encode_stream(
    config=spdl.io.video_encode_config(
        height=480, width=640, pix_fmt="rgb24", frame_rate=frame_rate
    )
)
with muxer.open():
    if (packets := encoder.encode(filtered_frames)) is not None:
        muxer.write(0, packets)
    if (packets := encoder.flush()) is not None:
        muxer.write(0, packets)

For more details on filtering, including helper functions like spdl.io.get_buffer_desc() and spdl.io.get_abuffer_desc(), see Filter Graphs and Advanced Filter Graphs.

See Also