Benchmark video

This example measures the performance of video decoding with different concurrency settings.

It benchmarks spdl.io.load_video() across multiple dimensions:

  • Various video resolutions (SD: 640x480, HD: 1920x1080, 4K: 3840x2160)

  • Different worker thread counts (1, 2, 4, 8) for parallel video processing

  • Different decoder thread counts (1, 2, 4) for FFmpeg’s internal threading

The benchmark evaluates how throughput changes with:

  1. Worker-level concurrency: Number of videos processed concurrently (via num_workers)

  2. Decoder-level concurrency: FFmpeg’s internal threading (via decoder_options={"threads": "X"})

Example

$ numactl --membind 0 --cpubind 0 python benchmark_video.py --output video_benchmark_results.csv
# Plot results
$ python benchmark_video_plot.py --input video_benchmark_results.csv --output video_benchmark_plot.png

Result

In many cases, when decoding H264 videos, using 2 threads give a good performance.

../_static/data/example-benchmark-video.png

Source

Source

Click here to see the source.
  1# Copyright (c) Meta Platforms, Inc. and affiliates.
  2# All rights reserved.
  3#
  4# This source code is licensed under the BSD-style license found in the
  5# LICENSE file in the root directory of this source tree.
  6
  7# pyre-strict
  8
  9"""This example measures the performance of video decoding with different concurrency settings.
 10
 11It benchmarks :py:func:`spdl.io.load_video` across multiple dimensions:
 12
 13- Various video resolutions (SD: 640x480, HD: 1920x1080, 4K: 3840x2160)
 14- Different worker thread counts (1, 2, 4, 8) for parallel video processing
 15- Different decoder thread counts (1, 2, 4) for FFmpeg's internal threading
 16
 17The benchmark evaluates how throughput changes with:
 18
 191. **Worker-level concurrency**: Number of videos processed concurrently (via ``num_workers``)
 202. **Decoder-level concurrency**: FFmpeg's internal threading (via ``decoder_options={"threads": "X"}``)
 21
 22**Example**
 23
 24.. code-block:: shell
 25
 26   $ numactl --membind 0 --cpubind 0 python benchmark_video.py --output video_benchmark_results.csv
 27   # Plot results
 28   $ python benchmark_video_plot.py --input video_benchmark_results.csv --output video_benchmark_plot.png
 29
 30**Result**
 31
 32In many cases, when decoding H264 videos, using 2 threads give a good performance.
 33
 34.. image:: ../../_static/data/example-benchmark-video.png
 35
 36"""
 37
 38__all__ = [
 39    "BenchmarkConfig",
 40    "create_video_data",
 41    "load_video_with_config",
 42    "main",
 43]
 44
 45import argparse
 46import os
 47import subprocess
 48import tempfile
 49from dataclasses import dataclass
 50
 51import spdl.io
 52
 53try:
 54    from examples.benchmark_utils import (  # pyre-ignore[21]
 55        BenchmarkResult,
 56        BenchmarkRunner,
 57        ExecutorType,
 58        get_default_result_path,
 59        save_results_to_csv,
 60    )
 61except ImportError:
 62    from spdl.examples.benchmark_utils import (
 63        BenchmarkResult,
 64        BenchmarkRunner,
 65        ExecutorType,
 66        get_default_result_path,
 67        save_results_to_csv,
 68    )
 69
 70
 71DEFAULT_RESULT_PATH: str = get_default_result_path(__file__)
 72
 73
 74@dataclass(frozen=True)
 75class BenchmarkConfig:
 76    """BenchmarkConfig()
 77
 78    Configuration for a single video decoding benchmark run."""
 79
 80    resolution: str
 81    """Video resolution label (e.g., "SD", "HD", "4K")"""
 82
 83    width: int
 84    """Video width in pixels"""
 85
 86    height: int
 87    """Video height in pixels"""
 88
 89    duration_seconds: float
 90    """Duration of the video in seconds"""
 91
 92    num_workers: int
 93    """Number of concurrent worker threads"""
 94
 95    decoder_threads: int
 96    """Number of FFmpeg decoder threads"""
 97
 98    iterations: int
 99    """Number of iterations per run"""
100
101    num_runs: int
102    """Number of runs for statistical analysis"""
103
104
105def create_video_data(
106    width: int = 1920,
107    height: int = 1080,
108    duration_seconds: float = 5.0,
109    fps: int = 30,
110) -> bytes:
111    """Create a mock H.264 video file in memory for benchmarking.
112
113    Args:
114        width: Video width in pixels
115        height: Video height in pixels
116        duration_seconds: Duration of video in seconds
117        fps: Frames per second
118
119    Returns:
120        Video file as bytes (H.264 encoded in MP4 container)
121    """
122    with tempfile.NamedTemporaryFile(suffix=".mp4") as tmp_file:
123        output_path = tmp_file.name
124
125        cmd = [
126            "ffmpeg",
127            "-f",
128            "lavfi",
129            "-i",
130            f"testsrc=duration={duration_seconds}:size={width}x{height}:rate={fps}",
131            "-c:v",
132            "libx264",
133            "-preset",
134            "ultrafast",
135            "-pix_fmt",
136            "yuv420p",
137            "-y",
138            output_path,
139        ]
140
141        subprocess.run(
142            cmd,
143            check=True,
144            stdout=subprocess.DEVNULL,
145            stderr=subprocess.DEVNULL,
146        )
147
148        with open(output_path, "rb") as f:
149            video_data = f.read()
150
151        return video_data
152
153
154def load_video_with_config(
155    video_data: bytes, decoder_threads: int
156) -> spdl.io.CPUBuffer:
157    """Load video data using spdl.io.load_video with specified decoder threads.
158
159    Args:
160        video_data: Video file data as bytes
161        decoder_threads: Number of threads for FFmpeg decoder
162
163    Returns:
164        Decoded video frames as CPUBuffer
165    """
166    decode_config = spdl.io.decode_config(
167        decoder_options={"threads": str(decoder_threads)}
168    )
169    return spdl.io.load_video(video_data, decode_config=decode_config)
170
171
172def _parse_args() -> argparse.Namespace:
173    """Parse command line arguments for the benchmark script.
174
175    Returns:
176        Parsed command line arguments
177    """
178    parser = argparse.ArgumentParser(description="Benchmark video decoding performance")
179    parser.add_argument(
180        "--output",
181        type=lambda p: os.path.realpath(p),
182        default=DEFAULT_RESULT_PATH,
183        help="Output file path.",
184    )
185    return parser.parse_args()
186
187
188def main() -> None:
189    """Run comprehensive benchmark suite for video decoding performance.
190
191    Benchmarks video decoding across different resolutions (SD, HD, 4K),
192    worker thread counts (1, 2, 4, 8), and decoder thread counts (1, 2, 4).
193    """
194    args = _parse_args()
195
196    video_configs = [
197        ("SD", 640, 480, 5.0),
198        ("HD", 1920, 1080, 5.0),
199        ("4K", 3840, 2160, 5.0),
200    ]
201
202    worker_counts = [1, 2, 4, 8]
203    decoder_thread_counts = [1, 2, 4]
204
205    results: list[BenchmarkResult[BenchmarkConfig]] = []
206
207    for resolution, width, height, duration in video_configs:
208        print(f"\nCreating {resolution} video ({width}x{height}, {duration}s)...")
209        video_data = create_video_data(
210            width=width, height=height, duration_seconds=duration
211        )
212        print(f"Video size: {len(video_data) / 1024 / 1024:.2f} MB")
213
214        print(f"\n{resolution} ({width}x{height})")
215        print("Workers,Decoder Threads,QPS,CI Lower,CI Upper,CPU %")
216
217        for num_workers in worker_counts:
218            with BenchmarkRunner(
219                executor_type=ExecutorType.THREAD,
220                num_workers=num_workers,
221            ) as runner:
222                for decoder_threads in decoder_thread_counts:
223                    config = BenchmarkConfig(
224                        resolution=resolution,
225                        width=width,
226                        height=height,
227                        duration_seconds=duration,
228                        num_workers=num_workers,
229                        decoder_threads=decoder_threads,
230                        iterations=num_workers * 2,
231                        num_runs=5,
232                    )
233
234                    result, output = runner.run(
235                        config,
236                        lambda data=video_data,
237                        threads=decoder_threads: load_video_with_config(data, threads),
238                        config.iterations,
239                        num_runs=config.num_runs,
240                    )
241
242                    results.append(result)
243
244                    print(
245                        f"{num_workers},{decoder_threads},"
246                        f"{result.qps:.2f},{result.ci_lower:.2f},{result.ci_upper:.2f},"
247                        f"{result.cpu_percent:.1f}"
248                    )
249
250    save_results_to_csv(results, args.output)
251    print(
252        f"\nBenchmark complete. To generate plots, run:\n"
253        f"python benchmark_video_plot.py --input {args.output} "
254        f"--output {args.output.replace('.csv', '.png')}"
255    )
256
257
258if __name__ == "__main__":
259    main()

API Reference

Functions

create_video_data(width: int = 1920, height: int = 1080, duration_seconds: float = 5.0, fps: int = 30) bytes[source]

Create a mock H.264 video file in memory for benchmarking.

Parameters:
  • width – Video width in pixels

  • height – Video height in pixels

  • duration_seconds – Duration of video in seconds

  • fps – Frames per second

Returns:

Video file as bytes (H.264 encoded in MP4 container)

load_video_with_config(video_data: bytes, decoder_threads: int) CPUBuffer[source]

Load video data using spdl.io.load_video with specified decoder threads.

Parameters:
  • video_data – Video file data as bytes

  • decoder_threads – Number of threads for FFmpeg decoder

Returns:

Decoded video frames as CPUBuffer

main() None[source]

Run comprehensive benchmark suite for video decoding performance.

Benchmarks video decoding across different resolutions (SD, HD, 4K), worker thread counts (1, 2, 4, 8), and decoder thread counts (1, 2, 4).

Classes

class BenchmarkConfig[source]

Configuration for a single video decoding benchmark run.

decoder_threads: int

Number of FFmpeg decoder threads

duration_seconds: float

Duration of the video in seconds

height: int

Video height in pixels

iterations: int

Number of iterations per run

num_runs: int

Number of runs for statistical analysis

num_workers: int

Number of concurrent worker threads

resolution: str

Video resolution label (e.g., “SD”, “HD”, “4K”)

width: int

Video width in pixels