High-Level Loading Functions

Overview

To load audio/video/image data from a file or in-memory buffer, you can use the following high-level functions:

These functions provide a simple interface that handles the entire decoding pipeline internally. They return a spdl.io.CPUBuffer or spdl.io.CUDABuffer object containing the decoded data as a contiguous memory region.

Basic Usage

Loading Audio

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_audio("audio.mp3")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (time, channel), dtype: float32

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from bytes
data: bytes = download_from_remote_storage("audio.mp3")
buffer = spdl.io.load_audio(data)

# Load specific time window
buffer = spdl.io.load_audio("audio.wav", timestamp=(5.0, 10.0))  # 5-10 seconds

Loading Video

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_video("video.mp4")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (time, height, width, channel), dtype: uint8

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from URL
buffer = spdl.io.load_video("https://example.com/video.mp4")

# Load specific time window
buffer = spdl.io.load_video("video.mp4", timestamp=(0.0, 5.0))  # First 5 seconds

Loading Image

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_image("image.jpg")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (height, width, channel), dtype: uint8

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from bytes
data: bytes = download_from_remote_storage("image.png")
buffer = spdl.io.load_image(data)

Buffer Objects

The buffer objects returned by the loading functions implement standard array interface protocols:

This allows zero-copy conversion to commonly used array classes.

SPDL provides the following conversion functions:

Default Output Formats

By default, the loading functions produce the following formats:

Audio:

  • Sample format: 32-bit floating point (float32)

  • Channel layout: Interleaved (channel-last)

  • Shape: (time, channel)

Video:

  • Pixel format: RGB24 (interleaved)

  • Layout: NHWC where N is time, C=3 (RGB)

  • Shape: (time, height, width, channel)

  • Data type: uint8

Image:

  • Pixel format: RGB24 (interleaved)

  • Layout: HWC where C=3 (RGB)

  • Shape: (height, width, channel)

  • Data type: uint8

Customizing Output Format

You can customize the output format by providing a filter_desc parameter. SPDL provides helper functions to construct filter descriptions:

Note

spdl.io.get_video_filter_desc() can be used for both image and video loading.

These helper functions allow you to specify common preprocessing operations such as:

  • Resampling/rescaling

  • Format conversion

  • Cropping

  • Frame rate adjustment

  • Trimming to specific number of frames/samples

For detailed information about custom filter creation, see Filter Graphs.

Custom Audio Format Example

import spdl.io

# Resample to 16kHz, convert to mono, change format to 16-bit signed integer
buffer = spdl.io.load_audio(
    "audio.wav",
    filter_desc=spdl.io.get_audio_filter_desc(
        sample_rate=16_000,
        num_channels=1,
        sample_fmt="s16p",
        num_frames=80_000,
    )
)
array = spdl.io.to_numpy(buffer)

Custom Video Format Example

import spdl.io

# Resize to 256x256, then crop to 224x224, adjust frame rate to 30fps
buffer = spdl.io.load_video(
    "video.mp4",
    filter_desc=spdl.io.get_video_filter_desc(
        frame_rate=(30, 1),  # 30 fps as (numerator, denominator)
        scale_width=256,
        scale_height=256,
        scale_algo='bicubic',
        crop_width=224,
        crop_height=224,
        num_frames=10,
    )
)
tensor = spdl.io.to_torch(buffer)

Custom Image Format Example

import spdl.io

# Note: get_video_filter_desc works for images too
buffer = spdl.io.load_image(
    "image.jpg",
    filter_desc=spdl.io.get_video_filter_desc(
        scale_width=256,
        scale_height=256,
        crop_width=224,
        crop_height=224,
    )
)
array = spdl.io.to_numpy(buffer)

Transferring to GPU

High-level loading functions accept a device_config parameter to automatically transfer decoded data to GPU:

import spdl.io

# Create GPU device configuration
cuda_config = spdl.io.cuda_config(device_index=0)

# Audio
audio_buffer = spdl.io.load_audio("audio.mp3", device_config=cuda_config)
audio_tensor = spdl.io.to_torch(audio_buffer)  # CUDA tensor

# Video
video_buffer = spdl.io.load_video("video.mp4", device_config=cuda_config)
video_tensor = spdl.io.to_torch(video_buffer)  # CUDA tensor

# Image
image_buffer = spdl.io.load_image("image.jpg", device_config=cuda_config)
image_tensor = spdl.io.to_torch(image_buffer)  # CUDA tensor

Note

The device_config parameter performs CPU decoding followed by GPU transfer. It does not use hardware-accelerated video decoding.

For hardware-accelerated video decoding using NVIDIA’s NVDEC hardware decoder, see Hardware-Accelerated Video Decoding.

Custom Memory Allocators

For better integration with PyTorch or other frameworks, you can specify custom memory allocators via the allocator parameter in spdl.io.cuda_config():

import spdl.io
import torch

# Use PyTorch's caching allocator
cuda_config = spdl.io.cuda_config(
    device_index=0,
    allocator=(
        torch.cuda.caching_allocator_alloc,
        torch.cuda.caching_allocator_delete
    )
)

# Load and transfer using PyTorch's allocator
buffer = spdl.io.load_video("video.mp4", device_config=cuda_config)
tensor = spdl.io.to_torch(buffer)
# Memory is managed by PyTorch's allocator

Benefits of custom allocators:

  • Unified memory management with your framework

  • Better memory pooling and reuse

  • Reduced memory fragmentation

Note

Custom allocators work with all GPU operations in SPDL, including hardware-accelerated video decoding. See Hardware-Accelerated Video Decoding for more details.