Advanced Filter Graphs¶
This section covers advanced filter graph usage, including complex graphs with multiple inputs and outputs,
and direct use of the spdl.io.FilterGraph class for fine-grained control.
When to Use FilterGraph Directly¶
The high-level functions (load_audio(), load_video(), etc.) and
decode_packets() handle filtering automatically using simple linear filter chains.
Use FilterGraph directly when you need:
Multiple inputs: Combining multiple media streams (e.g., side-by-side video comparison)
Multiple outputs: Splitting one stream into multiple processed versions
Streaming processing: Processing media in chunks without loading everything into memory
Complex filter topologies: Non-linear filter graphs with branches and merges
Fine-grained control: Manual control over when frames are added and retrieved
FilterGraph Basics¶
The FilterGraph class provides a low-level interface to FFmpeg’s filter graph system.
Basic Workflow¶
Create a filter graph with a filter description
Add frames to input nodes using
add_frames()Get frames from output nodes using
get_frames()Flush the graph when done using
flush()
Input and Output Nodes¶
Unlike simple filter chains, complex filter graphs require explicit input and output nodes:
Input nodes:
buffer(video/image) orabuffer(audio)Output nodes:
buffersink(video/image) orabuffersink(audio)
Helper functions construct these nodes:
spdl.io.get_buffer_desc()- Create video/image input nodespdl.io.get_abuffer_desc()- Create audio input node
Simple FilterGraph Example¶
Here’s a basic example using FilterGraph for a simple passthrough:
import spdl.io
# Load source
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create filter graph with explicit input/output nodes
buffer_desc = spdl.io.get_buffer_desc(codec)
filter_desc = f"{buffer_desc},scale=256:256,format=rgb24,buffersink"
filter_graph = spdl.io.FilterGraph(filter_desc)
print(filter_graph) # Print graph structure
# Process frames
buffers = []
for packets in demuxer.streaming_demux(duration=1):
frames = decoder.decode(packets)
# Add frames to filter graph
filter_graph.add_frames(frames)
# Get filtered frames
filtered_frames = filter_graph.get_frames()
if filtered_frames is not None:
buffer = spdl.io.convert_frames(filtered_frames)
buffers.append(spdl.io.to_numpy(buffer))
# Flush remaining frames
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames)
filter_graph.flush()
if (frames := filter_graph.get_frames()) is not None:
buffer = spdl.io.convert_frames(frames)
buffers.append(spdl.io.to_numpy(buffer))
# Combine all buffers
result = np.concatenate(buffers)
Multiple Input Graphs¶
Complex filter graphs can accept multiple input streams. This is useful for:
Side-by-side video comparison
Video overlays
Audio mixing
Picture-in-picture effects
Labeling Input Nodes¶
To use multiple inputs, label each input node with a unique name:
# Create two input nodes with labels
buffer0 = spdl.io.get_buffer_desc(codec, label="in0")
buffer1 = spdl.io.get_buffer_desc(codec, label="in1")
# Construct filter graph that stacks videos vertically
filter_desc = f"{buffer0} [in0];{buffer1} [in1],[in0] [in1] vstack,buffersink"
The syntax breakdown:
buffer@in0=...- Input node named “in0”[in0]- Label for the output of this node[in0] [in1] vstack- Stack the two labeled streamsbuffersink- Output node
Side-by-Side Video Example¶
import spdl.io
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create filter graph with two inputs stacked vertically
buf0 = spdl.io.get_buffer_desc(codec, label="in0")
buf1 = spdl.io.get_buffer_desc(codec, label="in1")
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] vstack,buffersink"
filter_graph = spdl.io.FilterGraph(filter_desc)
buffers = []
for packets in demuxer.streaming_demux(duration=1):
frames = decoder.decode(packets)
# Add the same frames to both inputs (creates duplicate)
filter_graph.add_frames(frames.clone(), key="buffer@in0")
filter_graph.add_frames(frames, key="buffer@in1")
# Get stacked output
filtered_frames = filter_graph.get_frames()
if filtered_frames is not None:
buffer = spdl.io.convert_frames(filtered_frames)
buffers.append(spdl.io.to_numpy(buffer))
# Flush
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames.clone(), key="buffer@in0")
filter_graph.add_frames(frames, key="buffer@in1")
filter_graph.flush()
if (frames := filter_graph.get_frames()) is not None:
buffer = spdl.io.convert_frames(frames)
buffers.append(spdl.io.to_numpy(buffer))
result = np.concatenate(buffers)
# result now contains frames stacked vertically (double height)
Common Multi-Input Filters¶
Horizontal stack (side-by-side):
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] hstack,buffersink"
Vertical stack (top-bottom):
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] vstack,buffersink"
Overlay (picture-in-picture):
# Overlay second video on top of first at position (10, 10)
filter_desc = ";".join(
[
f"{buf0} [main]",
f"{buf1} [pip]",
"[pip] scale=96:72 [pip_scaled]",
"[main][pip_scaled] overlay=x=W-w-10:y=H-h-10 [overlaid]",
"[overlaid] format=rgb24,buffersink",
]
)
Blend:
# Blend two videos with 50% opacity each
filter_desc = f"{buf0} [in0];{buf1} [in1],[in0] [in1] blend=all_mode=average,buffersink"
Multiple Output Graphs¶
Filter graphs can produce multiple output streams. This is useful for:
Generating multiple resolutions simultaneously
Creating different augmented versions
Extracting different features from the same source
Labeling Output Nodes¶
To use multiple outputs, label each output node:
filter_desc = ";".join([
f"{spdl.io.get_buffer_desc(codec)} [in]",
"[in] split [out0][out1]",
"[out0] buffersink@out0",
"[out1] buffersink@out1",
])
The syntax breakdown:
[in] split [out0][out1]- Split input into two streamsbuffersink@out0- Output node named “out0”buffersink@out1- Output node named “out1”
Multi-Resolution Output Example¶
import spdl.io
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create filter graph with two outputs at different resolutions
filter_desc = ";".join([
f"{spdl.io.get_buffer_desc(codec)} [in]",
"[in] split [tmp0][tmp1]",
"[tmp0] scale=256:256 [out0]",
"[tmp1] scale=128:128 [out1]",
"[out0] buffersink@out0",
"[out1] buffersink@out1",
])
filter_graph = spdl.io.FilterGraph(filter_desc)
buffers_256, buffers_128 = [], []
for packets in demuxer.streaming_demux(duration=1):
frames = decoder.decode(packets)
filter_graph.add_frames(frames)
# Get frames from first output (256x256)
frames_256 = filter_graph.get_frames(key="buffersink@out0")
if frames_256 is not None:
buffer = spdl.io.convert_frames(frames_256)
buffers_256.append(spdl.io.to_numpy(buffer))
# Get frames from second output (128x128)
frames_128 = filter_graph.get_frames(key="buffersink@out1")
if frames_128 is not None:
buffer = spdl.io.convert_frames(frames_128)
buffers_128.append(spdl.io.to_numpy(buffer))
# Flush
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames)
filter_graph.flush()
if (frames := filter_graph.get_frames(key="buffersink@out0")) is not None:
buffer = spdl.io.convert_frames(frames)
buffers_256.append(spdl.io.to_numpy(buffer))
if (frames := filter_graph.get_frames(key="buffersink@out1")) is not None:
buffer = spdl.io.convert_frames(frames)
buffers_128.append(spdl.io.to_numpy(buffer))
result_256 = np.concatenate(buffers_256) # Shape: (N, 256, 256, C)
result_128 = np.concatenate(buffers_128) # Shape: (N, 128, 128, C)
Common Multi-Output Patterns¶
Different augmentations:
filter_desc = ";".join([
f"{spdl.io.get_buffer_desc(codec)} [in]",
"[in] split [tmp0][tmp1]",
"[tmp0] hflip [out0]",
"[tmp1] vflip [out1]",
"[out0] buffersink@out0",
"[out1] buffersink@out1",
])
Different color spaces:
filter_desc = ";".join([
f"{spdl.io.get_buffer_desc(codec)} [in]",
"[in] split [tmp0][tmp1]",
"[tmp0] format=rgb24 [out0]",
"[tmp1] format=gray [out1]",
"[out0] buffersink@out0",
"[out1] buffersink@out1",
])
Multimedia Filters¶
FFmpeg provides multimedia filters that can convert between audio and video streams.
Audio to Video Visualization¶
The showwaves filter converts audio waveforms to video:
import spdl.io
demuxer = spdl.io.Demuxer("audio.mp3")
codec = demuxer.audio_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create filter graph: audio input -> video output
abuffer_desc = spdl.io.get_abuffer_desc(codec)
filter_desc = f"{abuffer_desc},showwaves,buffersink"
filter_graph = spdl.io.FilterGraph(filter_desc)
video_buffers = []
for packets in demuxer.streaming_demux(duration=1):
audio_frames = decoder.decode(packets)
# Add audio frames
filter_graph.add_frames(audio_frames)
# Get video frames
video_frames = filter_graph.get_frames()
if video_frames is not None:
buffer = spdl.io.convert_frames(video_frames)
video_buffers.append(spdl.io.to_numpy(buffer))
# Flush
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames)
filter_graph.flush()
if (frames := filter_graph.get_frames()) is not None:
buffer = spdl.io.convert_frames(frames)
video_buffers.append(spdl.io.to_numpy(buffer))
video_result = np.concatenate(video_buffers)
# video_result contains visualization of audio waveform
Other Multimedia Filters¶
showspectrum - Audio spectrum visualization:
filter_desc = f"{abuffer_desc},showspectrum,buffersink"
showfreqs - Frequency visualization:
filter_desc = f"{abuffer_desc},showfreqs,buffersink"
avectorscope - Stereo audio vectorscope:
filter_desc = f"{abuffer_desc},avectorscope,buffersink"
Complex Graph Examples¶
Example 1: Multi-Input with Different Processing¶
Process two video streams differently and combine them:
import spdl.io
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create complex filter: apply different effects to each input
buf0 = spdl.io.get_buffer_desc(codec, label="in0")
buf1 = spdl.io.get_buffer_desc(codec, label="in1")
filter_desc = ";".join([
f"{buf0} [in0]",
f"{buf1} [in1]",
"[in0] hflip,scale=320:240 [left]",
"[in1] vflip,scale=320:240 [right]",
"[left][right] hstack",
"buffersink"
])
filter_graph = spdl.io.FilterGraph(filter_desc)
buffers = []
for packets in demuxer.streaming_demux(duration=1):
frames = decoder.decode(packets)
filter_graph.add_frames(frames.clone(), key="buffer@in0")
filter_graph.add_frames(frames, key="buffer@in1")
filtered_frames = filter_graph.get_frames()
if filtered_frames is not None:
buffer = spdl.io.convert_frames(filtered_frames)
buffers.append(spdl.io.to_numpy(buffer))
# Flush
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames.clone(), key="buffer@in0")
filter_graph.add_frames(frames, key="buffer@in1")
filter_graph.flush()
if (frames := filter_graph.get_frames()) is not None:
buffer = spdl.io.convert_frames(frames)
buffers.append(spdl.io.to_numpy(buffer))
result = np.concatenate(buffers)
# Result: horizontally stacked video with left side flipped horizontally,
# right side flipped vertically
Example 2: Multi-Output with Branching¶
Create a thumbnail grid from a single video:
import spdl.io
demuxer = spdl.io.Demuxer("video.mp4")
codec = demuxer.video_codec
decoder = spdl.io.Decoder(codec, filter_desc=None)
# Create 2x2 grid of thumbnails with different effects
filter_desc = ";".join([
f"{spdl.io.get_buffer_desc(codec)} [in]",
"[in] split=4 [tmp0][tmp1][tmp2][tmp3]",
"[tmp0] scale=160:120 [tl]",
"[tmp1] scale=160:120,hflip [tr]",
"[tmp2] scale=160:120,vflip [bl]",
"[tmp3] scale=160:120,hflip,vflip [br]",
"[tl][tr] hstack [top]",
"[bl][br] hstack [bottom]",
"[top][bottom] vstack",
"buffersink"
])
filter_graph = spdl.io.FilterGraph(filter_desc)
buffers = []
for packets in demuxer.streaming_demux(duration=1):
frames = decoder.decode(packets)
filter_graph.add_frames(frames)
filtered_frames = filter_graph.get_frames()
if filtered_frames is not None:
buffer = spdl.io.convert_frames(filtered_frames)
buffers.append(spdl.io.to_numpy(buffer))
# Flush
if (frames := decoder.flush()) is not None:
filter_graph.add_frames(frames)
filter_graph.flush()
if (frames := filter_graph.get_frames()) is not None:
buffer = spdl.io.convert_frames(frames)
buffers.append(spdl.io.to_numpy(buffer))
result = np.concatenate(buffers)
# Result: 320x240 video showing 2x2 grid of the same video with different flips
Debugging Filter Graphs¶
Visualizing Graph Structure¶
The FilterGraph class provides a string representation showing the graph structure:
filter_graph = spdl.io.FilterGraph(filter_desc)
print(filter_graph)
This outputs a text diagram showing:
All nodes in the graph
Connections between nodes
Data formats at each connection
Example output:
+-----------------+
| Parsed_buffer_0 |default--[320x240 1:1 yuv420p]--Parsed_scale_1:default
| (buffer) |
+-----------------+
+-----------------+
Parsed_buffer_0:default--[320x240 1:1 yuv420p]--default| Parsed_scale_1 |default--[256x256 1:1 yuv420p]--Parsed_buffersink_2:default
| (scale) |
+-----------------+