Advanced Filter Graphs

This section covers advanced filter graph usage, including complex graphs with multiple inputs and outputs, and direct use of the spdl.io.FilterGraph class for fine-grained control.

When to Use FilterGraph Directly

The high-level functions (load_audio(), load_video(), etc.) and decode_packets() handle filtering automatically using simple linear filter chains.

Use FilterGraph directly when you need:

  • Multiple inputs: Combining multiple media streams (e.g., side-by-side video comparison)

  • Multiple outputs: Splitting one stream into multiple processed versions

  • Streaming processing: Processing media in chunks without loading everything into memory

  • Complex filter topologies: Non-linear filter graphs with branches and merges

  • Fine-grained control: Manual control over when frames are added and retrieved

FilterGraph Basics

The FilterGraph class provides a low-level interface to FFmpeg’s filter graph system.

Basic Workflow

  1. Create a filter graph with a filter description

  2. Add frames to input nodes using add_frames()

  3. Get frames from output nodes using get_frames()

  4. Flush the graph when done using flush()

Input and Output Nodes

Unlike simple filter chains, complex filter graphs require explicit input and output nodes:

  • Input nodes: buffer (video/image) or abuffer (audio)

  • Output nodes: buffersink (video/image) or abuffersink (audio)

Helper functions construct these nodes:

Simple FilterGraph Example

Here’s a basic example using FilterGraph for a simple passthrough:

import spdl.io

# Load source
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create filter graph with explicit input/output nodes
buffer_desc = spdl.io.get_buffer_desc(codec)
filter_desc = f"{buffer_desc},scale=256:256,format=rgb24,buffersink"

filter_graph = spdl.io.FilterGraph(filter_desc)
print(filter_graph)  # Print graph structure

# Process frames
buffers = []
for packets in demuxer.streaming_demux(duration=1):
    frames = decoder.decode(packets)

    # Add frames to filter graph
    filter_graph.add_frames(frames)

    # Get filtered frames
    filtered_frames = filter_graph.get_frames()
    if filtered_frames is not None:
        buffer = spdl.io.convert_frames(filtered_frames)
        buffers.append(spdl.io.to_numpy(buffer))

# Flush remaining frames
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames)

filter_graph.flush()

if (frames := filter_graph.get_frames()) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers.append(spdl.io.to_numpy(buffer))

# Combine all buffers
result = np.concatenate(buffers)

Multiple Input Graphs

Complex filter graphs can accept multiple input streams. This is useful for:

  • Side-by-side video comparison

  • Video overlays

  • Audio mixing

  • Picture-in-picture effects

Labeling Input Nodes

To use multiple inputs, label each input node with a unique name:

# Create two input nodes with labels
buffer0 = spdl.io.get_buffer_desc(codec, label="in0")
buffer1 = spdl.io.get_buffer_desc(codec, label="in1")

# Construct filter graph that stacks videos vertically
filter_desc = f"{buffer0} [in0];{buffer1} [in1],[in0] [in1] vstack,buffersink"

The syntax breakdown:

  • buffer@in0=... - Input node named “in0”

  • [in0] - Label for the output of this node

  • [in0] [in1] vstack - Stack the two labeled streams

  • buffersink - Output node

Side-by-Side Video Example

import spdl.io

demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create filter graph with two inputs stacked vertically
buf0 = spdl.io.get_buffer_desc(codec, label="in0")
buf1 = spdl.io.get_buffer_desc(codec, label="in1")
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] vstack,buffersink"

filter_graph = spdl.io.FilterGraph(filter_desc)

buffers = []
for packets in demuxer.streaming_demux(duration=1):
    frames = decoder.decode(packets)

    # Add the same frames to both inputs (creates duplicate)
    filter_graph.add_frames(frames.clone(), key="buffer@in0")
    filter_graph.add_frames(frames, key="buffer@in1")

    # Get stacked output
    filtered_frames = filter_graph.get_frames()
    if filtered_frames is not None:
        buffer = spdl.io.convert_frames(filtered_frames)
        buffers.append(spdl.io.to_numpy(buffer))

# Flush
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames.clone(), key="buffer@in0")
    filter_graph.add_frames(frames, key="buffer@in1")

filter_graph.flush()

if (frames := filter_graph.get_frames()) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers.append(spdl.io.to_numpy(buffer))

result = np.concatenate(buffers)
# result now contains frames stacked vertically (double height)

Common Multi-Input Filters

Horizontal stack (side-by-side):

filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] hstack,buffersink"
../_static/data/io_multi_input_hstack.png

Vertical stack (top-bottom):

filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] vstack,buffersink"
../_static/data/io_multi_input_vstack.png

Overlay (picture-in-picture):

# Overlay second video on top of first at position (10, 10)
filter_desc = ";".join(
    [
        f"{buf0} [main]",
        f"{buf1} [pip]",
        "[pip] scale=96:72 [pip_scaled]",
        "[main][pip_scaled] overlay=x=W-w-10:y=H-h-10 [overlaid]",
        "[overlaid] format=rgb24,buffersink",
    ]
)
../_static/data/io_overlay_pip.png

Blend:

# Blend two videos with 50% opacity each
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] blend=all_mode=average,buffersink"

Multiple Output Graphs

Filter graphs can produce multiple output streams. This is useful for:

  • Generating multiple resolutions simultaneously

  • Creating different augmented versions

  • Extracting different features from the same source

Labeling Output Nodes

To use multiple outputs, label each output node:

filter_desc = ";".join([
    f"{spdl.io.get_buffer_desc(codec)} [in]",
    "[in] split [out0][out1]",
    "[out0] buffersink@out0",
    "[out1] buffersink@out1",
])

The syntax breakdown:

  • [in] split [out0][out1] - Split input into two streams

  • buffersink@out0 - Output node named “out0”

  • buffersink@out1 - Output node named “out1”

Multi-Resolution Output Example

import spdl.io

demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create filter graph with two outputs at different resolutions
filter_desc = ";".join([
    f"{spdl.io.get_buffer_desc(codec)} [in]",
    "[in] split [tmp0][tmp1]",
    "[tmp0] scale=256:256 [out0]",
    "[tmp1] scale=128:128 [out1]",
    "[out0] buffersink@out0",
    "[out1] buffersink@out1",
])

filter_graph = spdl.io.FilterGraph(filter_desc)

buffers_256, buffers_128 = [], []

for packets in demuxer.streaming_demux(duration=1):
    frames = decoder.decode(packets)
    filter_graph.add_frames(frames)

    # Get frames from first output (256x256)
    frames_256 = filter_graph.get_frames(key="buffersink@out0")
    if frames_256 is not None:
        buffer = spdl.io.convert_frames(frames_256)
        buffers_256.append(spdl.io.to_numpy(buffer))

    # Get frames from second output (128x128)
    frames_128 = filter_graph.get_frames(key="buffersink@out1")
    if frames_128 is not None:
        buffer = spdl.io.convert_frames(frames_128)
        buffers_128.append(spdl.io.to_numpy(buffer))

# Flush
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames)

filter_graph.flush()

if (frames := filter_graph.get_frames(key="buffersink@out0")) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers_256.append(spdl.io.to_numpy(buffer))

if (frames := filter_graph.get_frames(key="buffersink@out1")) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers_128.append(spdl.io.to_numpy(buffer))

result_256 = np.concatenate(buffers_256)  # Shape: (N, 256, 256, C)
result_128 = np.concatenate(buffers_128)  # Shape: (N, 128, 128, C)
../_static/data/io_multi_output.png

Common Multi-Output Patterns

Different augmentations:

filter_desc = ";".join([
    f"{spdl.io.get_buffer_desc(codec)} [in]",
    "[in] split [tmp0][tmp1]",
    "[tmp0] hflip [out0]",
    "[tmp1] vflip [out1]",
    "[out0] buffersink@out0",
    "[out1] buffersink@out1",
])
../_static/data/io_multi_input_different_processing.png

Different color spaces:

filter_desc = ";".join([
    f"{spdl.io.get_buffer_desc(codec)} [in]",
    "[in] split [tmp0][tmp1]",
    "[tmp0] format=rgb24 [out0]",
    "[tmp1] format=gray [out1]",
    "[out0] buffersink@out0",
    "[out1] buffersink@out1",
])

Multimedia Filters

FFmpeg provides multimedia filters that can convert between audio and video streams.

Audio to Video Visualization

The showwaves filter converts audio waveforms to video:

import spdl.io

demuxer = spdl.io.Demuxer("audio.mp3")
codec = demuxer.audio_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create filter graph: audio input -> video output
abuffer_desc = spdl.io.get_abuffer_desc(codec)
filter_desc = f"{abuffer_desc},showwaves,buffersink"

filter_graph = spdl.io.FilterGraph(filter_desc)

video_buffers = []
for packets in demuxer.streaming_demux(duration=1):
    audio_frames = decoder.decode(packets)

    # Add audio frames
    filter_graph.add_frames(audio_frames)

    # Get video frames
    video_frames = filter_graph.get_frames()
    if video_frames is not None:
        buffer = spdl.io.convert_frames(video_frames)
        video_buffers.append(spdl.io.to_numpy(buffer))

# Flush
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames)

filter_graph.flush()

if (frames := filter_graph.get_frames()) is not None:
    buffer = spdl.io.convert_frames(frames)
    video_buffers.append(spdl.io.to_numpy(buffer))

video_result = np.concatenate(video_buffers)
# video_result contains visualization of audio waveform
../_static/data/io_audio_to_video_showwaves.png

Other Multimedia Filters

showspectrum - Audio spectrum visualization:

filter_desc = f"{abuffer_desc},showspectrum,buffersink"
../_static/data/io_audio_to_video_showspectrum.png

showfreqs - Frequency visualization:

filter_desc = f"{abuffer_desc},showfreqs,buffersink"

avectorscope - Stereo audio vectorscope:

filter_desc = f"{abuffer_desc},avectorscope,buffersink"

Complex Graph Examples

Example 1: Multi-Input with Different Processing

Process two video streams differently and combine them:

import spdl.io

demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create complex filter: apply different effects to each input
buf0 = spdl.io.get_buffer_desc(codec, label="in0")
buf1 = spdl.io.get_buffer_desc(codec, label="in1")

filter_desc = ";".join([
    f"{buf0} [in0]",
    f"{buf1} [in1]",
    "[in0] hflip,scale=320:240 [left]",
    "[in1] vflip,scale=320:240 [right]",
    "[left][right] hstack",
    "buffersink"
])

filter_graph = spdl.io.FilterGraph(filter_desc)

buffers = []
for packets in demuxer.streaming_demux(duration=1):
    frames = decoder.decode(packets)

    filter_graph.add_frames(frames.clone(), key="buffer@in0")
    filter_graph.add_frames(frames, key="buffer@in1")

    filtered_frames = filter_graph.get_frames()
    if filtered_frames is not None:
        buffer = spdl.io.convert_frames(filtered_frames)
        buffers.append(spdl.io.to_numpy(buffer))

# Flush
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames.clone(), key="buffer@in0")
    filter_graph.add_frames(frames, key="buffer@in1")

filter_graph.flush()

if (frames := filter_graph.get_frames()) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers.append(spdl.io.to_numpy(buffer))

result = np.concatenate(buffers)
# Result: horizontally stacked video with left side flipped horizontally,
# right side flipped vertically

Example 2: Multi-Output with Branching

Create a thumbnail grid from a single video:

import spdl.io

demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)

# Create 2x2 grid of thumbnails with different effects
filter_desc = ";".join([
    f"{spdl.io.get_buffer_desc(codec)} [in]",
    "[in] split=4 [tmp0][tmp1][tmp2][tmp3]",
    "[tmp0] scale=160:120 [tl]",
    "[tmp1] scale=160:120,hflip [tr]",
    "[tmp2] scale=160:120,vflip [bl]",
    "[tmp3] scale=160:120,hflip,vflip [br]",
    "[tl][tr] hstack [top]",
    "[bl][br] hstack [bottom]",
    "[top][bottom] vstack",
    "buffersink"
])

filter_graph = spdl.io.FilterGraph(filter_desc)

buffers = []
for packets in demuxer.streaming_demux(duration=1):
    frames = decoder.decode(packets)
    filter_graph.add_frames(frames)

    filtered_frames = filter_graph.get_frames()
    if filtered_frames is not None:
        buffer = spdl.io.convert_frames(filtered_frames)
        buffers.append(spdl.io.to_numpy(buffer))

# Flush
if (frames := decoder.flush()) is not None:
    filter_graph.add_frames(frames)

filter_graph.flush()

if (frames := filter_graph.get_frames()) is not None:
    buffer = spdl.io.convert_frames(frames)
    buffers.append(spdl.io.to_numpy(buffer))

result = np.concatenate(buffers)
# Result: 320x240 video showing 2x2 grid of the same video with different flips
../_static/data/io_thumbnail_grid_2x2.png

Debugging Filter Graphs

Visualizing Graph Structure

The FilterGraph class provides a string representation showing the graph structure:

filter_graph = spdl.io.FilterGraph(filter_desc)
print(filter_graph)

This outputs a text diagram showing:

  • All nodes in the graph

  • Connections between nodes

  • Data formats at each connection

Example output:

+-----------------+
| Parsed_buffer_0 |default--[320x240 1:1 yuv420p]--Parsed_scale_1:default
|    (buffer)     |
+-----------------+

                                                       +-----------------+
Parsed_buffer_0:default--[320x240 1:1 yuv420p]--default| Parsed_scale_1  |default--[256x256 1:1 yuv420p]--Parsed_buffersink_2:default
                                                       |    (scale)      |
                                                       +-----------------+