High-Level Loading Functions ============================= Overview -------- To load audio/video/image data from a file or in-memory buffer, you can use the following high-level functions: - :py:func:`spdl.io.load_audio` - :py:func:`spdl.io.load_video` - :py:func:`spdl.io.load_image` These functions provide a simple interface that handles the entire decoding pipeline internally. They return a :py:class:`spdl.io.CPUBuffer` or :py:class:`spdl.io.CUDABuffer` object containing the decoded data as a contiguous memory region. Basic Usage ----------- Loading Audio ~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Load from file with minimum input buffer = spdl.io.load_audio("audio.mp3") # The buffer can be converted to different array types # NumPy array array = spdl.io.to_numpy(buffer) # shape: (time, channel), dtype: float32 # PyTorch tensor tensor = spdl.io.to_torch(buffer) # JAX array jax_array = spdl.io.to_jax(buffer) # Load from bytes data: bytes = download_from_remote_storage("audio.mp3") buffer = spdl.io.load_audio(data) # Load specific time window buffer = spdl.io.load_audio("audio.wav", timestamp=(5.0, 10.0)) # 5-10 seconds Loading Video ~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Load from file with minimum input buffer = spdl.io.load_video("video.mp4") # The buffer can be converted to different array types # NumPy array array = spdl.io.to_numpy(buffer) # shape: (time, height, width, channel), dtype: uint8 # PyTorch tensor tensor = spdl.io.to_torch(buffer) # JAX array jax_array = spdl.io.to_jax(buffer) # Load from URL buffer = spdl.io.load_video("https://example.com/video.mp4") # Load specific time window buffer = spdl.io.load_video("video.mp4", timestamp=(0.0, 5.0)) # First 5 seconds Loading Image ~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Load from file with minimum input buffer = spdl.io.load_image("image.jpg") # The buffer can be converted to different array types # NumPy array array = spdl.io.to_numpy(buffer) # shape: (height, width, channel), dtype: uint8 # PyTorch tensor tensor = spdl.io.to_torch(buffer) # JAX array jax_array = spdl.io.to_jax(buffer) # Load from bytes data: bytes = download_from_remote_storage("image.png") buffer = spdl.io.load_image(data) Buffer Objects -------------- The buffer objects returned by the loading functions implement standard array interface protocols: - `The array interface protocol `_ for CPU buffers - `The CUDA array interface `_ for CUDA buffers This allows zero-copy conversion to commonly used array classes. SPDL provides the following conversion functions: - :py:func:`spdl.io.to_numpy` (CPU only) - :py:func:`spdl.io.to_torch` - :py:func:`spdl.io.to_jax` (CPU only) - :py:func:`spdl.io.to_numba` Default Output Formats ----------------------- By default, the loading functions produce the following formats: **Audio:** - Sample format: 32-bit floating point (``float32``) - Channel layout: Interleaved (channel-last) - Shape: ``(time, channel)`` **Video:** - Pixel format: RGB24 (interleaved) - Layout: ``NHWC`` where ``N`` is time, ``C=3`` (RGB) - Shape: ``(time, height, width, channel)`` - Data type: ``uint8`` **Image:** - Pixel format: RGB24 (interleaved) - Layout: ``HWC`` where ``C=3`` (RGB) - Shape: ``(height, width, channel)`` - Data type: ``uint8`` Customizing Output Format -------------------------- You can customize the output format by providing a ``filter_desc`` parameter. SPDL provides helper functions to construct filter descriptions: - :py:func:`spdl.io.get_audio_filter_desc` - For audio preprocessing - :py:func:`spdl.io.get_video_filter_desc` - For video and image preprocessing - :py:func:`spdl.io.get_filter_desc` - For advanced custom filters .. note:: :py:func:`spdl.io.get_video_filter_desc` can be used for both image and video loading. These helper functions allow you to specify common preprocessing operations such as: - Resampling/rescaling - Format conversion - Cropping - Frame rate adjustment - Trimming to specific number of frames/samples For detailed information about custom filter creation, see :doc:`filtering`. Custom Audio Format Example ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Resample to 16kHz, convert to mono, change format to 16-bit signed integer buffer = spdl.io.load_audio( "audio.wav", filter_desc=spdl.io.get_audio_filter_desc( sample_rate=16_000, num_channels=1, sample_fmt="s16p", num_frames=80_000, ) ) array = spdl.io.to_numpy(buffer) Custom Video Format Example ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Resize to 256x256, then crop to 224x224, adjust frame rate to 30fps buffer = spdl.io.load_video( "video.mp4", filter_desc=spdl.io.get_video_filter_desc( frame_rate=(30, 1), # 30 fps as (numerator, denominator) scale_width=256, scale_height=256, scale_algo='bicubic', crop_width=224, crop_height=224, num_frames=10, ) ) tensor = spdl.io.to_torch(buffer) Custom Image Format Example ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import spdl.io # Note: get_video_filter_desc works for images too buffer = spdl.io.load_image( "image.jpg", filter_desc=spdl.io.get_video_filter_desc( scale_width=256, scale_height=256, crop_width=224, crop_height=224, ) ) array = spdl.io.to_numpy(buffer) Transferring to GPU ------------------- High-level loading functions accept a ``device_config`` parameter to automatically transfer decoded data to GPU: .. code-block:: python import spdl.io # Create GPU device configuration cuda_config = spdl.io.cuda_config(device_index=0) # Audio audio_buffer = spdl.io.load_audio("audio.mp3", device_config=cuda_config) audio_tensor = spdl.io.to_torch(audio_buffer) # CUDA tensor # Video video_buffer = spdl.io.load_video("video.mp4", device_config=cuda_config) video_tensor = spdl.io.to_torch(video_buffer) # CUDA tensor # Image image_buffer = spdl.io.load_image("image.jpg", device_config=cuda_config) image_tensor = spdl.io.to_torch(image_buffer) # CUDA tensor .. note:: The ``device_config`` parameter performs CPU decoding followed by GPU transfer. It does **not** use hardware-accelerated video decoding. For hardware-accelerated video decoding using NVIDIA's NVDEC hardware decoder, see :doc:`gpu_decoding`. .. _custom-allocators: Custom Memory Allocators ~~~~~~~~~~~~~~~~~~~~~~~~~ For better integration with PyTorch or other frameworks, you can specify custom memory allocators via the ``allocator`` parameter in :py:func:`spdl.io.cuda_config`: .. code-block:: python import spdl.io import torch # Use PyTorch's caching allocator cuda_config = spdl.io.cuda_config( device_index=0, allocator=( torch.cuda.caching_allocator_alloc, torch.cuda.caching_allocator_delete ) ) # Load and transfer using PyTorch's allocator buffer = spdl.io.load_video("video.mp4", device_config=cuda_config) tensor = spdl.io.to_torch(buffer) # Memory is managed by PyTorch's allocator **Benefits of custom allocators:** - Unified memory management with your framework - Better memory pooling and reuse - Reduced memory fragmentation .. note:: Custom allocators work with all GPU operations in SPDL, including hardware-accelerated video decoding. See :doc:`gpu_decoding` for more details.