Benchmark wav¶
This example measuers the performance of loading WAV audio.
It compares three different approaches for loading WAV files:
spdl.io.load_wav()
: Fast native WAV parser optimized for simple PCM formatsspdl.io.load_audio()
: General-purpose audio loader using FFmpeg backendsoundfile
(libsndfile
): Popular third-party audio I/O library
The benchmark suite evaluates performance across multiple dimensions:
Various audio configurations (sample rates, channels, bit depths, durations)
Different thread counts (1, 2, 4, 8, 16) to measure parallel scaling
Statistical analysis with 95% confidence intervals using Student’s t-distribution
Queries per second (QPS) as the primary performance metric
Example
$ python benchmark_wav.py --plot --output results.png
Result
The following plot shows the QPS (measured by the number of files processed) of each functions with different audio durations.

The spdl.io.load_wav()
is a lot faster than the others, because all it
does is reinterpret the input byte string as array.
It shows the same performance for audio with longer duration.
And since parsing WAV is instant, the spdl.io.load_wav function spends more time on creation of NumPy Array. It needs to acquire the GIL, thus the performance does not scale in multi-threading. (This performance pattern of this function is pretty same as the spdl.io.load_npz.)
The following is the same plot without load_wav
.

libsoundfile
has to process data iteratively (using io.BytesIO
) because
it does not support directly loading from byte string, so it takes longer to process
longer audio data.
The performance trend (single thread being the fastest) suggests that
it does not release the GIL majority of the time.
The spdl.io.load_audio()
function (the generic FFmpeg-based implementation) does
a lot of work so its overall performance is not as good,
but it scales in multi-threading as it releases the GIL almost entirely.
Source¶
Source
Click here to see the source.
1# Copyright (c) Meta Platforms, Inc. and affiliates.
2# All rights reserved.
3#
4# This source code is licensed under the BSD-style license found in the
5# LICENSE file in the root directory of this source tree.
6
7# pyre-strict
8
9"""This example measuers the performance of loading WAV audio.
10
11It compares three different approaches for loading WAV files:
12
13- :py:func:`spdl.io.load_wav`: Fast native WAV parser optimized for simple PCM formats
14- :py:func:`spdl.io.load_audio`: General-purpose audio loader using FFmpeg backend
15- ``soundfile`` (``libsndfile``): Popular third-party audio I/O library
16
17The benchmark suite evaluates performance across multiple dimensions:
18
19- Various audio configurations (sample rates, channels, bit depths, durations)
20- Different thread counts (1, 2, 4, 8, 16) to measure parallel scaling
21- Statistical analysis with 95% confidence intervals using Student's t-distribution
22- Queries per second (QPS) as the primary performance metric
23
24**Example**
25
26.. code-block:: shell
27
28 $ python benchmark_wav.py --plot --output results.png
29
30**Result**
31
32The following plot shows the QPS (measured by the number of files processed) of each
33functions with different audio durations.
34
35.. image:: ../../_static/data/example-benchmark-wav.webp
36
37
38The :py:func:`spdl.io.load_wav` is a lot faster than the others, because all it
39does is reinterpret the input byte string as array.
40It shows the same performance for audio with longer duration.
41
42And since parsing WAV is instant, the spdl.io.load_wav function spends more time on
43creation of NumPy Array.
44It needs to acquire the GIL, thus the performance does not scale in multi-threading.
45(This performance pattern of this function is pretty same as the
46:ref:`spdl.io.load_npz <data-format>`.)
47
48The following is the same plot without ``load_wav``.
49
50.. image:: ../../_static/data/example-benchmark-wav-2.webp
51
52``libsoundfile`` has to process data iteratively (using ``io.BytesIO``) because
53it does not support directly loading from byte string, so it takes longer to process
54longer audio data.
55The performance trend (single thread being the fastest) suggests that
56it does not release the GIL majority of the time.
57
58The :py:func:`spdl.io.load_audio` function (the generic FFmpeg-based implementation) does
59a lot of work so its overall performance is not as good,
60but it scales in multi-threading as it releases the GIL almost entirely.
61"""
62
63__all__ = [
64 "BenchmarkResult",
65 "BenchmarkConfig",
66 "create_wav_data",
67 "load_sf",
68 "load_spdl_audio",
69 "load_spdl_wav",
70 "benchmark",
71 "run_benchmark_suite",
72 "plot_benchmark_results",
73 "main",
74]
75
76import argparse
77import io
78import struct
79import time
80import wave
81from collections.abc import Callable
82from concurrent.futures import as_completed, ThreadPoolExecutor
83from dataclasses import dataclass
84
85import numpy as np
86import scipy.stats
87import soundfile as sf
88import spdl.io
89from numpy.typing import NDArray
90
91
92def create_wav_data(
93 sample_rate: int = 44100,
94 num_channels: int = 2,
95 bits_per_sample: int = 16,
96 duration_seconds: float = 1.0,
97) -> tuple[bytes, NDArray]:
98 """Create a WAV file in memory for benchmarking.
99
100 Args:
101 sample_rate: Sample rate in Hz
102 num_channels: Number of audio channels
103 bits_per_sample: Bits per sample (16 or 32)
104 duration_seconds: Duration of audio in seconds
105
106 Returns:
107 Tuple of (WAV file as bytes, audio samples array)
108 """
109 num_samples = int(sample_rate * duration_seconds)
110
111 # Generate audio samples with sine wave pattern
112 dtype_map = {
113 16: np.int16,
114 32: np.int32,
115 }
116 dtype = dtype_map[bits_per_sample]
117
118 # Create audio samples with a simple sine wave pattern
119 samples = np.zeros((num_samples, num_channels), dtype=dtype)
120 for channel_idx in range(num_channels):
121 frequency = 440.0 + (channel_idx * 110.0) # A4 and harmonics
122 t = np.linspace(0, duration_seconds, num_samples)
123 sine_wave = np.sin(2 * np.pi * frequency * t)
124
125 if bits_per_sample == 16:
126 samples[:, channel_idx] = (sine_wave * 32767).astype(dtype)
127 elif bits_per_sample == 32:
128 samples[:, channel_idx] = (sine_wave * 2147483647).astype(dtype)
129
130 # Use Python's built-in wave module to write WAV file to memory buffer
131 wav_buffer = io.BytesIO()
132 with wave.open(wav_buffer, "wb") as wav_file:
133 wav_file.setnchannels(num_channels)
134 wav_file.setsampwidth(bits_per_sample // 8)
135 wav_file.setframerate(sample_rate)
136
137 # Convert samples to bytes
138 if bits_per_sample == 16:
139 format_char = "h" # signed short
140 elif bits_per_sample == 32:
141 format_char = "i" # signed int
142 else:
143 raise ValueError(f"Unsupported bits_per_sample: {bits_per_sample}")
144
145 # Interleave channels and pack to bytes
146 frames = b""
147 for frame_idx in range(num_samples):
148 for channel_idx in range(num_channels):
149 frames += struct.pack(format_char, int(samples[frame_idx, channel_idx]))
150
151 wav_file.writeframes(frames)
152
153 wav_data = wav_buffer.getvalue()
154
155 return wav_data, samples
156
157
158def load_sf(wav_data: bytes) -> NDArray:
159 """Load WAV data using soundfile library.
160
161 Args:
162 wav_data: WAV file data as bytes
163
164 Returns:
165 Audio samples array as int16 numpy array
166 """
167 audio_file = io.BytesIO(wav_data)
168 data, _ = sf.read(audio_file, dtype="int16")
169 return data
170
171
172def load_spdl_audio(wav_data: bytes) -> NDArray:
173 """Load WAV data using :py:func:`spdl.io.load_audio` function.
174
175 Args:
176 wav_data: WAV file data as bytes
177
178 Returns:
179 Audio samples array as numpy array
180 """
181 return spdl.io.to_numpy(spdl.io.load_audio(wav_data, filter_desc=None))
182
183
184def load_spdl_wav(wav_data: bytes) -> NDArray:
185 """Load WAV data using :py:func:`spdl.io.load_wav` function.
186
187 Args:
188 wav_data: WAV file data as bytes
189
190 Returns:
191 Audio samples array as numpy array
192 """
193 return spdl.io.to_numpy(spdl.io.load_wav(wav_data))
194
195
196@dataclass(frozen=True)
197class BenchmarkResult:
198 """Results from a single benchmark run."""
199
200 duration: float
201 qps: float
202 ci_lower: float
203 ci_upper: float
204 num_threads: int
205 function_name: str
206 duration_seconds: float
207
208
209def benchmark(
210 name: str,
211 func: Callable[[], NDArray],
212 iterations: int,
213 num_threads: int,
214 num_sets: int,
215 duration_seconds: float,
216) -> tuple[BenchmarkResult, NDArray]:
217 """Benchmark a function using multiple threads and calculate statistics.
218
219 Executes a warmup phase followed by multiple benchmark sets to compute
220 performance metrics including mean queries per second (QPS) and 95%
221 confidence intervals using Student's t-distribution.
222
223 Args:
224 name: Descriptive name for the benchmark (used in results)
225 func: Callable function to benchmark (takes no args, returns NDArray)
226 iterations: Total number of function calls per benchmark set
227 num_threads: Number of concurrent threads for parallel execution
228 num_sets: Number of independent benchmark sets for confidence interval
229 duration_seconds: Duration of audio file being processed (for metadata)
230
231 Returns:
232 Tuple containing:
233 - BenchmarkResult with timing statistics, QPS, confidence intervals
234 - Output NDArray from the last function execution
235 """
236
237 with ThreadPoolExecutor(max_workers=num_threads) as executor:
238 # Warmup
239 futures = [executor.submit(func) for _ in range(num_threads * 30)]
240 for future in as_completed(futures):
241 output = future.result()
242
243 # Run multiple sets for confidence interval
244 qps_samples = []
245 for _ in range(num_sets):
246 t0 = time.perf_counter()
247 futures = [executor.submit(func) for _ in range(iterations)]
248 for future in as_completed(futures):
249 output = future.result()
250 elapsed = time.perf_counter() - t0
251 qps_samples.append(iterations / elapsed)
252
253 # Calculate mean and 95% confidence interval
254 qps_mean = np.mean(qps_samples)
255 qps_std = np.std(qps_samples, ddof=1)
256 confidence_level = 0.95
257 degrees_freedom = num_sets - 1
258 confidence_interval = scipy.stats.t.interval(
259 confidence_level,
260 degrees_freedom,
261 loc=qps_mean,
262 scale=qps_std / np.sqrt(num_sets),
263 )
264
265 duration = 1.0 / qps_mean
266 result = BenchmarkResult(
267 duration=duration,
268 qps=qps_mean,
269 ci_lower=float(confidence_interval[0]),
270 ci_upper=float(confidence_interval[1]),
271 num_threads=num_threads,
272 function_name=name,
273 duration_seconds=duration_seconds,
274 )
275 return result, output # pyre-ignore[61]
276
277
278def run_benchmark_suite(
279 wav_data: bytes,
280 ref: NDArray,
281 num_threads: int,
282 duration_seconds: float,
283) -> tuple[BenchmarkResult, BenchmarkResult, BenchmarkResult]:
284 """Run benchmarks for both libraries with given parameters.
285
286 Args:
287 wav_data: WAV file data as bytes
288 ref: Reference audio array for validation
289 num_threads: Number of threads (use 1 for single-threaded)
290 duration_seconds: Duration of audio in seconds
291
292 Returns:
293 Tuple of (spdl_wav_result, spdl_audio_result, soundfile_result)
294 """
295 # load_wav is fast but the performance is unstable, so we need to run more
296 iterations = 100 * num_threads
297 num_sets = 100
298
299 spdl_wav_result, output = benchmark(
300 name="spdl.io.load_wav",
301 func=lambda: load_spdl_wav(wav_data),
302 iterations=iterations,
303 num_threads=num_threads,
304 num_sets=num_sets,
305 duration_seconds=duration_seconds,
306 )
307 np.testing.assert_array_equal(output, ref)
308
309 # others are slow but the performance is stable.
310 iterations = 10 * num_threads
311 num_sets = 5
312
313 spdl_audio_result, output = benchmark(
314 name="spdl.io.load_audio",
315 func=lambda: load_spdl_audio(wav_data),
316 iterations=iterations,
317 num_threads=num_threads,
318 num_sets=num_sets,
319 duration_seconds=duration_seconds,
320 )
321 np.testing.assert_array_equal(output, ref)
322 soundfile_result, output = benchmark(
323 name="soundfile",
324 func=lambda: load_sf(wav_data),
325 iterations=iterations,
326 num_threads=num_threads,
327 num_sets=num_sets,
328 duration_seconds=duration_seconds,
329 )
330 if output.ndim == 1:
331 output = output[:, None]
332 np.testing.assert_array_equal(output, ref)
333
334 return spdl_wav_result, spdl_audio_result, soundfile_result
335
336
337@dataclass(frozen=True)
338class BenchmarkConfig:
339 """Configuration for audio file parameters used in benchmarking.
340
341 Attributes:
342 sample_rate: Audio sample rate in Hz (e.g., 44100 for CD quality)
343 num_channels: Number of audio channels (1=mono, 2=stereo, etc.)
344 bits_per_sample: Bit depth per sample (16 or 32)
345 duration_seconds: Duration of the audio file in seconds
346 """
347
348 sample_rate: int
349 num_channels: int
350 bits_per_sample: int
351 duration_seconds: float
352
353
354def plot_benchmark_results(
355 results: list[BenchmarkResult], output_file: str = "benchmark_results.png"
356) -> None:
357 """Plot benchmark results and save to file.
358
359 Args:
360 results: List of BenchmarkResult objects containing benchmark data
361 output_file: Output file path for the saved plot
362 """
363 import matplotlib
364 import matplotlib.pyplot as plt
365 import pandas as pd
366 import seaborn as sns
367
368 matplotlib.use("Agg") # Use non-interactive backend
369
370 data = [
371 {
372 "num_threads": r.num_threads,
373 "qps": r.qps,
374 "ci_lower": r.ci_lower,
375 "ci_upper": r.ci_upper,
376 "function": r.function_name,
377 "duration": f"{r.duration_seconds}s",
378 }
379 for r in results
380 ]
381 df = pd.DataFrame(data)
382
383 sns.set_theme(style="whitegrid")
384 _, ax = plt.subplots(figsize=(12, 6))
385 df["label"] = df["function"] + " (" + df["duration"] + ")"
386 for label in df["label"].unique():
387 subset = df[df["label"] == label].sort_values("num_threads")
388 line = ax.plot(
389 subset["num_threads"],
390 subset["qps"],
391 marker="o",
392 label=label,
393 linewidth=2,
394 )
395
396 # Add confidence interval as shaded region
397 ax.fill_between(
398 subset["num_threads"],
399 subset["ci_lower"],
400 subset["ci_upper"],
401 alpha=0.2,
402 color=line[0].get_color(),
403 )
404
405 ax.set_xlabel("Number of Threads", fontsize=12)
406 ax.set_ylabel("QPS (Queries Per Second)", fontsize=12)
407 ax.set_title("WAV Loading Performance Benchmark", fontsize=14, fontweight="bold")
408 ax.legend(title="Function", bbox_to_anchor=(1.05, 1), loc="upper left")
409 ax.grid(True, alpha=0.3)
410
411 plt.tight_layout()
412 plt.savefig(output_file, dpi=300, bbox_inches="tight")
413 print(f"Plot saved to {output_file}")
414
415
416def _parse_args() -> argparse.Namespace:
417 """Parse command line arguments for the benchmark script.
418
419 Returns:
420 Parsed command line arguments
421 """
422 parser = argparse.ArgumentParser(description="Benchmark WAV loading performance")
423 parser.add_argument(
424 "--plot",
425 action="store_true",
426 help="Generate and save a plot of the benchmark results",
427 )
428 parser.add_argument(
429 "--output",
430 type=str,
431 default="benchmark_results.png",
432 help="Output file path for the plot (default: benchmark_results.png)",
433 )
434 return parser.parse_args()
435
436
437def main() -> None:
438 """Run comprehensive benchmark suite for WAV loading performance.
439
440 Benchmarks multiple configurations of audio files with different durations,
441 comparing spdl.io.load_wav, spdl.io.load_audio, and soundfile libraries
442 across various thread counts (1, 2, 4, 8, 16).
443 """
444 args = _parse_args()
445
446 benchmark_configs = [
447 # (sample_rate, num_channels, bits_per_sample, duration_seconds, iterations)
448 # BenchmarkConfig(8000, 1, 16, 1.0), # Low quality mono
449 # BenchmarkConfig(16000, 1, 16, 1.0), # Speech quality mono
450 # BenchmarkConfig(48000, 2, 16, 1.0), # High quality stereo
451 # BenchmarkConfig(48000, 8, 16, 1.0), # Multi-channel audio
452 BenchmarkConfig(44100, 2, 16, 1.0), # CD quality stereo
453 BenchmarkConfig(44100, 2, 16, 10.0), #
454 BenchmarkConfig(44100, 2, 16, 60.0), #
455 # (44100, 2, 24, 1.0, 100), # 24-bit audio
456 ]
457
458 results: list[BenchmarkResult] = []
459
460 for cfg in benchmark_configs:
461 print(cfg)
462 wav_data, ref = create_wav_data(
463 sample_rate=cfg.sample_rate,
464 num_channels=cfg.num_channels,
465 bits_per_sample=cfg.bits_per_sample,
466 duration_seconds=cfg.duration_seconds,
467 )
468 print(
469 f"Threads,"
470 f"SPDL WAV QPS ({cfg.duration_seconds} sec),CI Lower, CI Upper,"
471 f"SPDL Audio QPS ({cfg.duration_seconds} sec),CI Lower, CI Upper,"
472 f"soundfile QPS ({cfg.duration_seconds} sec),CI Lower, CI Upper"
473 )
474 for num_threads in [1, 2, 4, 8, 16]:
475 spdl_wav_result, spdl_audio_result, soundfile_result = run_benchmark_suite(
476 wav_data,
477 ref,
478 num_threads=num_threads,
479 duration_seconds=cfg.duration_seconds,
480 )
481 results.extend([spdl_wav_result, spdl_audio_result, soundfile_result])
482 print(
483 f"{num_threads},"
484 f"{spdl_wav_result.qps:.2f},{spdl_wav_result.ci_lower:.2f},{spdl_wav_result.ci_upper:.2f},"
485 f"{spdl_audio_result.qps:.2f},{spdl_audio_result.ci_lower:.2f},{spdl_audio_result.ci_upper:.2f},"
486 f"{soundfile_result.qps:.2f},{soundfile_result.ci_lower:.2f},{soundfile_result.ci_upper:.2f}"
487 )
488
489 if args.plot:
490 plot_benchmark_results(results, args.output)
491
492
493if __name__ == "__main__":
494 main()
Functions¶
Functions
- create_wav_data(sample_rate: int = 44100, num_channels: int = 2, bits_per_sample: int = 16, duration_seconds: float = 1.0) tuple[bytes, ndarray[tuple[Any, ...], dtype[_ScalarT]]] [source]¶
Create a WAV file in memory for benchmarking.
- Parameters:
sample_rate – Sample rate in Hz
num_channels – Number of audio channels
bits_per_sample – Bits per sample (16 or 32)
duration_seconds – Duration of audio in seconds
- Returns:
Tuple of (WAV file as bytes, audio samples array)
- load_sf(wav_data: bytes) ndarray[tuple[Any, ...], dtype[_ScalarT]] [source]¶
Load WAV data using soundfile library.
- Parameters:
wav_data – WAV file data as bytes
- Returns:
Audio samples array as int16 numpy array
- load_spdl_audio(wav_data: bytes) ndarray[tuple[Any, ...], dtype[_ScalarT]] [source]¶
Load WAV data using
spdl.io.load_audio()
function.- Parameters:
wav_data – WAV file data as bytes
- Returns:
Audio samples array as numpy array
- load_spdl_wav(wav_data: bytes) ndarray[tuple[Any, ...], dtype[_ScalarT]] [source]¶
Load WAV data using
spdl.io.load_wav()
function.- Parameters:
wav_data – WAV file data as bytes
- Returns:
Audio samples array as numpy array
- benchmark(name: str, func: Callable[[], ndarray[tuple[Any, ...], dtype[_ScalarT]]], iterations: int, num_threads: int, num_sets: int, duration_seconds: float) tuple[BenchmarkResult, ndarray[tuple[Any, ...], dtype[_ScalarT]]] [source]¶
Benchmark a function using multiple threads and calculate statistics.
Executes a warmup phase followed by multiple benchmark sets to compute performance metrics including mean queries per second (QPS) and 95% confidence intervals using Student’s t-distribution.
- Parameters:
name – Descriptive name for the benchmark (used in results)
func – Callable function to benchmark (takes no args, returns NDArray)
iterations – Total number of function calls per benchmark set
num_threads – Number of concurrent threads for parallel execution
num_sets – Number of independent benchmark sets for confidence interval
duration_seconds – Duration of audio file being processed (for metadata)
- Returns:
BenchmarkResult with timing statistics, QPS, confidence intervals
Output NDArray from the last function execution
- Return type:
Tuple containing
- run_benchmark_suite(wav_data: bytes, ref: ndarray[tuple[Any, ...], dtype[_ScalarT]], num_threads: int, duration_seconds: float) tuple[BenchmarkResult, BenchmarkResult, BenchmarkResult] [source]¶
Run benchmarks for both libraries with given parameters.
- Parameters:
wav_data – WAV file data as bytes
ref – Reference audio array for validation
num_threads – Number of threads (use 1 for single-threaded)
duration_seconds – Duration of audio in seconds
- Returns:
Tuple of (spdl_wav_result, spdl_audio_result, soundfile_result)
Classes¶
Classes
- class BenchmarkResult(duration: float, qps: float, ci_lower: float, ci_upper: float, num_threads: int, function_name: str, duration_seconds: float)[source]¶
Results from a single benchmark run.