High-Level Loading Functions¶

Overview¶

To load audio/video/image data from a file or in-memory buffer, you can use the following high-level functions:

These functions provide a simple interface that handles the entire decoding pipeline internally. They return a spdl.io.CPUBuffer or spdl.io.CUDABuffer object containing the decoded data as a contiguous memory region.

Basic Usage¶

Loading Audio¶

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_audio("audio.mp3")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (time, channel), dtype: float32

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from bytes
data: bytes = download_from_remote_storage("audio.mp3")
buffer = spdl.io.load_audio(data)

# Load specific time window
buffer = spdl.io.load_audio("audio.wav", timestamp=(5.0, 10.0))  # 5-10 seconds

Loading Video¶

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_video("video.mp4")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (time, height, width, channel), dtype: uint8

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from URL
buffer = spdl.io.load_video("https://example.com/video.mp4")

# Load specific time window
buffer = spdl.io.load_video("video.mp4", timestamp=(0.0, 5.0))  # First 5 seconds

Loading Image¶

import spdl.io

# Load from file with minimum input
buffer = spdl.io.load_image("image.jpg")

# The buffer can be converted to different array types

# NumPy array
array = spdl.io.to_numpy(buffer)  # shape: (height, width, channel), dtype: uint8

# PyTorch tensor
tensor = spdl.io.to_torch(buffer)

# JAX array
jax_array = spdl.io.to_jax(buffer)

# Load from bytes
data: bytes = download_from_remote_storage("image.png")
buffer = spdl.io.load_image(data)

Buffer Objects¶

The buffer objects returned by the loading functions implement standard array interface protocols:

The array interface protocol for CPU buffers
The CUDA array interface for CUDA buffers

This allows zero-copy conversion to commonly used array classes.

SPDL provides the following conversion functions:

Default Output Formats¶

By default, the loading functions produce the following formats:

Audio:

Sample format: 32-bit floating point (float32)
Channel layout: Interleaved (channel-last)
Shape: (time, channel)

Video:

Pixel format: RGB24 (interleaved)
Layout: NHWC where N is time, C=3 (RGB)
Shape: (time, height, width, channel)
Data type: uint8

Image:

Pixel format: RGB24 (interleaved)
Layout: HWC where C=3 (RGB)
Shape: (height, width, channel)
Data type: uint8

Note

The spdl.io module supports decoding images and videos into various color formats beyond RGB, including YUV420p, NV12, and other pixel formats. You can specify the desired pixel format using the pix_fmt parameter in spdl.io.get_video_filter_desc(). See Filter Graphs for details.

Customizing Output Format¶

You can customize the output format by providing a filter_desc parameter. SPDL provides helper functions to construct filter descriptions:

spdl.io.get_audio_filter_desc() - For audio preprocessing
spdl.io.get_video_filter_desc() - For video and image preprocessing
spdl.io.get_filter_desc() - For advanced custom filters

Note

spdl.io.get_video_filter_desc() can be used for both image and video loading.

These helper functions allow you to specify common preprocessing operations such as:

Resampling/rescaling
Format conversion
Cropping
Frame rate adjustment
Trimming to specific number of frames/samples

For detailed information about custom filter creation, see Filter Graphs.

Custom Audio Format Example¶

import spdl.io

# Resample to 16kHz, convert to mono, change format to 16-bit signed integer
buffer = spdl.io.load_audio(
    "audio.wav",
    filter_desc=spdl.io.get_audio_filter_desc(
        sample_rate=16_000,
        num_channels=1,
        sample_fmt="s16p",
        num_frames=80_000,
    )
)
array = spdl.io.to_numpy(buffer)

Custom Video Format Example¶

import spdl.io

# Resize to 256x256, then crop to 224x224, adjust frame rate to 30fps
buffer = spdl.io.load_video(
    "video.mp4",
    filter_desc=spdl.io.get_video_filter_desc(
        frame_rate=(30, 1),  # 30 fps as (numerator, denominator)
        scale_width=256,
        scale_height=256,
        scale_algo='bicubic',
        crop_width=224,
        crop_height=224,
        num_frames=10,
    )
)
tensor = spdl.io.to_torch(buffer)

Custom Image Format Example¶

import spdl.io

# Note: get_video_filter_desc works for images too
buffer = spdl.io.load_image(
    "image.jpg",
    filter_desc=spdl.io.get_video_filter_desc(
        scale_width=256,
        scale_height=256,
        crop_width=224,
        crop_height=224,
    )
)
array = spdl.io.to_numpy(buffer)

Transferring to GPU¶

High-level loading functions accept a device_config parameter to automatically transfer decoded data to GPU:

import spdl.io

# Create GPU device configuration
cuda_config = spdl.io.cuda_config(device_index=0)

# Audio
audio_buffer = spdl.io.load_audio("audio.mp3", device_config=cuda_config)
audio_tensor = spdl.io.to_torch(audio_buffer)  # CUDA tensor

# Video
video_buffer = spdl.io.load_video("video.mp4", device_config=cuda_config)
video_tensor = spdl.io.to_torch(video_buffer)  # CUDA tensor

# Image
image_buffer = spdl.io.load_image("image.jpg", device_config=cuda_config)
image_tensor = spdl.io.to_torch(image_buffer)  # CUDA tensor

Note

The device_config parameter performs CPU decoding followed by GPU transfer. It does not use hardware-accelerated video decoding.

For hardware-accelerated video decoding using NVIDIA’s NVDEC hardware decoder, see Hardware-Accelerated Video Decoding.

Custom Memory Allocators¶

For better integration with PyTorch or other frameworks, you can specify custom memory allocators via the allocator parameter in spdl.io.cuda_config():

import spdl.io
import torch

# Use PyTorch's caching allocator
cuda_config = spdl.io.cuda_config(
    device_index=0,
    allocator=(
        torch.cuda.caching_allocator_alloc,
        torch.cuda.caching_allocator_delete
    )
)

# Load and transfer using PyTorch's allocator
buffer = spdl.io.load_video("video.mp4", device_config=cuda_config)
tensor = spdl.io.to_torch(buffer)
# Memory is managed by PyTorch's allocator

Benefits of custom allocators:

Unified memory management with your framework
Better memory pooling and reuse
Reduced memory fragmentation

Note

Custom allocators work with all GPU operations in SPDL, including hardware-accelerated video decoding. See Hardware-Accelerated Video Decoding for more details.