Filter Graphs¶

This section explains how to use filters to customize the decoding process in SPDL. Filters allow you to transform media data during decoding, which can be more efficient than post-processing.

FFmpeg Filters Overview¶

FFmpeg filters are processing operations that transform audio or video frames. Filters can perform operations like:

Format conversion (YUV to RGB, sample format changes)
Resizing and scaling
Cropping
Rotation and flipping
Color adjustments
Audio resampling
And many more…

For a complete list of available filters, see the FFmpeg Filters Documentation.

Filter Descriptions¶

In SPDL, you can specify the filtering with a string that follows FFmpeg’s filter graph description syntex.

Basic syntax:

filter1=param1=value1:param2=value2,filter2,filter3=param=value

Filters are separated by commas (,), and parameters within a filter are separated by colons (:) using key=value format.

Example filter descriptions:

# Scale video to 256x256 and convert to RGB
"scale=256:256,format=rgb24"

# Resample audio to 16kHz mono
"aresample=16000,pan=mono|c0=c0"

# Crop, rotate, and flip
"crop=224:224:0:0,rotate=30*PI/180,hflip"

Helper Functions¶

SPDL provides helper functions to construct filter descriptions for common use cases. These functions handle the complexity of filter syntax and ensure correct parameter formatting.

get_audio_filter_desc¶

spdl.io.get_audio_filter_desc() generates filter descriptions for audio processing.

Parameters:

sample_rate: Target sample rate (Hz)
num_channels: Target number of channels
sample_fmt: Target sample format (e.g., "fltp", "s16p")
num_frames: Exact number of frames to output (pads or trims as needed)
timestamp: Time window to extract (tuple of start and end times)
filter_desc: Additional custom filters to apply

Example:

import spdl.io

# Resample to 16kHz, convert to mono, 16-bit integer
filter_desc = spdl.io.get_audio_filter_desc(
    sample_rate=16_000,
    num_channels=1,
    sample_fmt="s16p"
)
# Result: "aformat=channel_layouts=1c,aresample=16000,aformat=sample_fmts=s16p"

# Extract 5 seconds starting at 10 seconds
filter_desc = spdl.io.get_audio_filter_desc(
    timestamp=(10.0, 15.0),
    sample_rate=16_000
)

# Use in decoding
packets = spdl.io.demux_audio("audio.mp3")
frames = spdl.io.decode_packets(packets, filter_desc=filter_desc)

get_video_filter_desc¶

spdl.io.get_video_filter_desc() generates filter descriptions for video/image processing.

Parameters:

frame_rate: Target frame rate (frames per second or tuple of numerator/denominator)
scale_width, scale_height: Target dimensions for scaling
scale_algo: Scaling algorithm ("bilinear", "bicubic", "lanczos", etc.)
scale_mode: How to handle aspect ratio ("stretch", "pad", "crop")
crop_width, crop_height: Dimensions for center cropping
pix_fmt: Target pixel format (e.g., "rgb24", "bgr24", "yuv420p", "nv12", etc.)
num_frames: Exact number of frames to output
pad_mode: How to pad if fewer frames than requested ("black", "repeat_last")
timestamp: Time window to extract
filter_desc: Additional custom filters to apply

Example:

import spdl.io

# Scale to 256x256, convert to RGB
filter_desc = spdl.io.get_video_filter_desc(
    scale_width=256,
    scale_height=256,
    pix_fmt="rgb24"
)

# Scale with padding to preserve aspect ratio
filter_desc = spdl.io.get_video_filter_desc(
    scale_width=256,
    scale_height=256,
    scale_mode="pad",
    pix_fmt="rgb24"
)

# Extract 30 frames at 30fps, scale and crop
filter_desc = spdl.io.get_video_filter_desc(
    frame_rate=30,
    scale_width=256,
    scale_height=256,
    crop_width=224,
    crop_height=224,
    num_frames=30,
    pix_fmt="rgb24"
)

# Use in decoding
packets = spdl.io.demux_video("video.mp4")
frames = spdl.io.decode_packets(packets, filter_desc=filter_desc)

get_filter_desc¶

spdl.io.get_filter_desc() is a convenience function that automatically selects the appropriate helper based on the packet type.

Example:

import spdl.io

# Automatically uses get_video_filter_desc
packets = spdl.io.demux_video("video.mp4")
filter_desc = spdl.io.get_filter_desc(
    packets,
    scale_width=256,
    scale_height=256
)

# Automatically uses get_audio_filter_desc
packets = spdl.io.demux_audio("audio.mp3")
filter_desc = spdl.io.get_filter_desc(
    packets,
    sample_rate=16_000
)

Custom Filter Descriptions¶

For advanced use cases, you can write custom filter descriptions or combine them with the helper functions.

Writing Custom Filters¶

You can write filter descriptions manually as strings:

import spdl.io

# Custom filter: horizontal flip, rotate 45 degrees, scale
custom_filter = "hflip,rotate=45*PI/180,scale=256:256"

packets = spdl.io.demux_video("video.mp4")
frames = spdl.io.decode_packets(packets, filter_desc=custom_filter)

Combining with Helper Functions¶

The helper functions accept a filter_desc parameter to insert custom filters:

import spdl.io

# Add custom filters before format conversion
filter_desc = spdl.io.get_video_filter_desc(
    filter_desc="hflip,vflip",  # Custom filters applied first
    scale_width=256,
    scale_height=256,
    pix_fmt="rgb24"  # Format conversion applied last
)

# Result: "hflip,vflip,scale=256:256,format=rgb24"

Important: When using timestamp parameter with custom filters, ensure your custom filters don’t interfere with the trimming filters that SPDL adds automatically.

Common Filter Examples¶

Video Filters¶

Horizontal and vertical flip:

filter_desc = "hflip,vflip"

Rotation:

# Rotate 90 degrees clockwise
filter_desc = "rotate=90*PI/180"

# Rotate with variable angle
angle_rad = 45 * 3.14159 / 180
filter_desc = f"rotate={angle_rad}"

Brightness and contrast:

# Increase brightness by 0.1, contrast by 1.2
filter_desc = "eq=brightness=0.1:contrast=1.2"

Gaussian blur:

filter_desc = "gblur=sigma=2"

Random crop:

import random

# Note: crop_x and crop_y are not supported by get_video_filter_desc.
# Cropping is always center-based. For custom crop positions,
# use get_filter_desc with a custom crop filter.

x_pos = random.random()  # 0.0 to 1.0
y_pos = random.random()

# Center crop only (no positional control)
filter_desc = spdl.io.get_video_filter_desc(
    scale_width=256,
    scale_height=256,
    crop_width=224,
    crop_height=224,
    pix_fmt="rgb24"
)

Audio Filters¶

Volume adjustment:

# Increase volume by 50%
filter_desc = "volume=1.5"

High-pass filter:

# Remove frequencies below 200 Hz
filter_desc = "highpass=f=200"

Tempo change:

# Speed up by 1.5x
filter_desc = "atempo=1.5"

Data Augmentation with Filters¶

Filters are particularly useful for data augmentation during training. Applying filters to decoded frames before buffer conversion can be more memory-efficient than applying the same operations after creating a contiguous array.

Random Augmentation Example¶

Filters can be generated with random parameters for data augmentation. The following example shows how to change augmentation values dynamically.

import random
import spdl.io

def get_random_augmentation_filter():
    """Generate a random augmentation filter for images."""
    filters = []

    # Random horizontal flip (50% chance)
    if random.random() < 0.5:
        filters.append("hflip")

    # Random vertical flip (50% chance)
    if random.random() < 0.5:
        filters.append("vflip")

    # Random rotation (-30 to +30 degrees)
    angle = (random.random() * 60 - 30) * 3.14159 / 180
    filters.append(f"rotate={angle:.3f}")

    # Combine custom filters with standard preprocessing
    custom_filters = ",".join(filters)

    # Note: For custom crop positions, include crop filter in filter_desc
    # get_video_filter_desc only supports center cropping via crop_width/crop_height
    return spdl.io.get_video_filter_desc(
        filter_desc=custom_filters,
        scale_width=256,
        scale_height=256,
        crop_width=224,
        crop_height=224,
        pix_fmt="rgb24"
    )

def load_image_with_augmentation(image_path):
    """Load and augment an image."""
    packets = spdl.io.demux_image(image_path)
    filter_desc = get_random_augmentation_filter()
    frames = spdl.io.decode_packets(packets, filter_desc=filter_desc)
    buffer = spdl.io.convert_frames(frames)
    return spdl.io.to_torch(buffer)

# Use in training loop
for image_path in training_images:
    augmented_image = load_image_with_augmentation(image_path)
    # augmented_image.shape: (224, 224, 3)

The following is an example filter descriptions generated and the resulting images.

"hflip,rotate=angle=-0.05,scale=256:256,crop=224:224:x=0.18*(iw-ow):y=0.17*(ih-oh)"
"hflip,vflip,rotate=angle=-0.37,scale=256:256,crop=224:224:x=0.09*(iw-ow):y=0.96*(ih-oh)"
"rotate=angle=0.33,scale=256:256,crop=224:224:x=0.58*(iw-ow):y=0.57*(ih-oh)"
"hflip,vflip,rotate=angle=0.30,scale=256:256,crop=224:224:x=0.80*(iw-ow):y=0.35*(ih-oh)"
"hflip,vflip,rotate=angle=0.02,scale=256:256,crop=224:224:x=0.01*(iw-ow):y=0.25*(ih-oh)"
"vflip,rotate=angle=0.35,scale=256:256,crop=224:224:x=0.42*(iw-ow):y=0.69*(ih-oh)"
"hflip,rotate=angle=0.22,scale=256:256,crop=224:224:x=0.10*(iw-ow):y=0.03*(ih-oh)"
"hflip,rotate=angle=-0.18,scale=256:256,crop=224:224:x=0.65*(iw-ow):y=0.31*(ih-oh)"
"rotate=angle=-0.13,scale=256:256,crop=224:224:x=0.37*(iw-ow):y=0.75*(ih-oh)"
"hflip,vflip,rotate=angle=0.01,scale=256:256,crop=224:224:x=0.27*(iw-ow):y=0.84*(ih-oh)"
"hflip,rotate=angle=-0.31,scale=256:256,crop=224:224:x=0.43*(iw-ow):y=0.92*(ih-oh)"
"hflip,rotate=angle=-0.27,scale=256:256,crop=224:224:x=0.96*(iw-ow):y=0.92*(ih-oh)"
"vflip,rotate=angle=-0.28,scale=256:256,crop=224:224:x=0.61*(iw-ow):y=0.04*(ih-oh)"
"hflip,vflip,rotate=angle=0.08,scale=256:256,crop=224:224:x=0.84*(iw-ow):y=0.57*(ih-oh)"
"hflip,vflip,rotate=angle=0.41,scale=256:256,crop=224:224:x=0.24*(iw-ow):y=0.92*(ih-oh)"
"hflip,rotate=angle=-0.02,scale=256:256,crop=224:224:x=0.47*(iw-ow):y=0.87*(ih-oh)"
"hflip,rotate=angle=-0.15,scale=256:256,crop=224:224:x=0.73*(iw-ow):y=0.30*(ih-oh)"
"vflip,rotate=angle=-0.13,scale=256:256,crop=224:224:x=0.91*(iw-ow):y=0.85*(ih-oh)"
"vflip,rotate=angle=0.28,scale=256:256,crop=224:224:x=0.62*(iw-ow):y=0.02*(ih-oh)"
"rotate=angle=0.24,scale=256:256,crop=224:224:x=0.85*(iw-ow):y=0.61*(ih-oh)"
"vflip,rotate=angle=-0.52,scale=256:256,crop=224:224:x=0.61*(iw-ow):y=0.59*(ih-oh)"
"vflip,rotate=angle=0.06,scale=256:256,crop=224:224:x=0.08*(iw-ow):y=0.04*(ih-oh)"
"hflip,rotate=angle=0.50,scale=256:256,crop=224:224:x=0.23*(iw-ow):y=0.42*(ih-oh)"
"vflip,rotate=angle=0.18,scale=256:256,crop=224:224:x=0.54*(iw-ow):y=0.34*(ih-oh)

../_images/io_preprocessing_random_aug.png

Benefits of Filters for Augmentation¶

Performance benefits:

Fewer pixels to process: Many video/image files use YUV420 format, which has half the pixels of RGB. Applying augmentation before RGB conversion processes fewer pixels.
Single-pass processing: Filters are applied during decoding in a single pass, avoiding multiple array operations.
Memory efficiency: No need to allocate intermediate arrays for each augmentation step.
Hardware acceleration: Some filters can use hardware acceleration when available.

Example comparison:

# Efficient: Augmentation during decoding (on YUV420 data)
filter_desc = spdl.io.get_video_filter_desc(
    filter_desc="hflip,rotate=0.5,scale=256:256",
    crop_width=224,
    crop_height=224,
    pix_fmt="rgb24"  # Convert to RGB at the end
)
buffer = spdl.io.load_image("image.jpg", filter_desc=filter_desc)

# Less efficient: Post-processing (on RGB24 data)
buffer = spdl.io.load_image("image.jpg")  # Decode to RGB first
array = spdl.io.to_numpy(buffer)
# Now apply transformations on the larger RGB array
array = flip_horizontal(array)
array = rotate(array, 0.5)
array = resize(array, (256, 256))
array = crop(array, 224, 224)