Filter Graphs ============= This section explains how to use filters to customize the decoding process in SPDL. Filters allow you to transform media data during decoding, which can be more efficient than post-processing. FFmpeg Filters Overview ----------------------- FFmpeg filters are processing operations that transform audio or video frames. Filters can perform operations like: - Format conversion (YUV to RGB, sample format changes) - Resizing and scaling - Cropping - Rotation and flipping - Color adjustments - Audio resampling - And many more... For a complete list of available filters, see the `FFmpeg Filters Documentation `_. Filter Descriptions ------------------- In SPDL, you can specify the filtering with a string that follows FFmpeg's filter graph description syntex. **Basic syntax:** .. code-block:: text filter1=param1=value1:param2=value2,filter2,filter3=param=value Filters are separated by commas (``,``), and parameters within a filter are separated by colons (``:``) using ``key=value`` format. **Example filter descriptions:** .. code-block:: python # Scale video to 256x256 and convert to RGB "scale=256:256,format=rgb24" # Resample audio to 16kHz mono "aresample=16000,pan=mono|c0=c0" # Crop, rotate, and flip "crop=224:224:0:0,rotate=30*PI/180,hflip" Helper Functions ---------------- SPDL provides helper functions to construct filter descriptions for common use cases. These functions handle the complexity of filter syntax and ensure correct parameter formatting. get_audio_filter_desc ~~~~~~~~~~~~~~~~~~~~~ :py:func:`spdl.io.get_audio_filter_desc` generates filter descriptions for audio processing. **Parameters:** - ``sample_rate``: Target sample rate (Hz) - ``num_channels``: Target number of channels - ``sample_fmt``: Target sample format (e.g., ``"fltp"``, ``"s16p"``) - ``num_frames``: Exact number of frames to output (pads or trims as needed) - ``timestamp``: Time window to extract (tuple of start and end times) - ``filter_desc``: Additional custom filters to apply **Example:** .. code-block:: python import spdl.io # Resample to 16kHz, convert to mono, 16-bit integer filter_desc = spdl.io.get_audio_filter_desc( sample_rate=16_000, num_channels=1, sample_fmt="s16p" ) # Result: "aformat=channel_layouts=1c,aresample=16000,aformat=sample_fmts=s16p" # Extract 5 seconds starting at 10 seconds filter_desc = spdl.io.get_audio_filter_desc( timestamp=(10.0, 15.0), sample_rate=16_000 ) # Use in decoding packets = spdl.io.demux_audio("audio.mp3") frames = spdl.io.decode_packets(packets, filter_desc=filter_desc) get_video_filter_desc ~~~~~~~~~~~~~~~~~~~~~ :py:func:`spdl.io.get_video_filter_desc` generates filter descriptions for video/image processing. **Parameters:** - ``frame_rate``: Target frame rate (frames per second or tuple of numerator/denominator) - ``scale_width``, ``scale_height``: Target dimensions for scaling - ``scale_algo``: Scaling algorithm (``"bilinear"``, ``"bicubic"``, ``"lanczos"``, etc.) - ``scale_mode``: How to handle aspect ratio (``"stretch"``, ``"pad"``, ``"crop"``) - ``crop_width``, ``crop_height``: Dimensions for center cropping - ``pix_fmt``: Target pixel format (e.g., ``"rgb24"``, ``"yuv420p"``) - ``num_frames``: Exact number of frames to output - ``pad_mode``: How to pad if fewer frames than requested (``"black"``, ``"repeat_last"``) - ``timestamp``: Time window to extract - ``filter_desc``: Additional custom filters to apply **Example:** .. code-block:: python import spdl.io # Scale to 256x256, convert to RGB filter_desc = spdl.io.get_video_filter_desc( scale_width=256, scale_height=256, pix_fmt="rgb24" ) # Scale with padding to preserve aspect ratio filter_desc = spdl.io.get_video_filter_desc( scale_width=256, scale_height=256, scale_mode="pad", pix_fmt="rgb24" ) # Extract 30 frames at 30fps, scale and crop filter_desc = spdl.io.get_video_filter_desc( frame_rate=30, scale_width=256, scale_height=256, crop_width=224, crop_height=224, num_frames=30, pix_fmt="rgb24" ) # Use in decoding packets = spdl.io.demux_video("video.mp4") frames = spdl.io.decode_packets(packets, filter_desc=filter_desc) get_filter_desc ~~~~~~~~~~~~~~~ :py:func:`spdl.io.get_filter_desc` is a convenience function that automatically selects the appropriate helper based on the packet type. **Example:** .. code-block:: python import spdl.io # Automatically uses get_video_filter_desc packets = spdl.io.demux_video("video.mp4") filter_desc = spdl.io.get_filter_desc( packets, scale_width=256, scale_height=256 ) # Automatically uses get_audio_filter_desc packets = spdl.io.demux_audio("audio.mp3") filter_desc = spdl.io.get_filter_desc( packets, sample_rate=16_000 ) Custom Filter Descriptions --------------------------- For advanced use cases, you can write custom filter descriptions or combine them with the helper functions. Writing Custom Filters ~~~~~~~~~~~~~~~~~~~~~~~ You can write filter descriptions manually as strings: .. code-block:: python import spdl.io # Custom filter: horizontal flip, rotate 45 degrees, scale custom_filter = "hflip,rotate=45*PI/180,scale=256:256" packets = spdl.io.demux_video("video.mp4") frames = spdl.io.decode_packets(packets, filter_desc=custom_filter) .. image:: ../_static/data/io_basic_filtergraph.png Combining with Helper Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The helper functions accept a ``filter_desc`` parameter to insert custom filters: .. code-block:: python import spdl.io # Add custom filters before format conversion filter_desc = spdl.io.get_video_filter_desc( filter_desc="hflip,vflip", # Custom filters applied first scale_width=256, scale_height=256, pix_fmt="rgb24" # Format conversion applied last ) # Result: "hflip,vflip,scale=256:256,format=rgb24" **Important:** When using ``timestamp`` parameter with custom filters, ensure your custom filters don't interfere with the trimming filters that SPDL adds automatically. Common Filter Examples ---------------------- Video Filters ~~~~~~~~~~~~~ **Horizontal and vertical flip:** .. code-block:: python filter_desc = "hflip,vflip" **Rotation:** .. code-block:: python # Rotate 90 degrees clockwise filter_desc = "rotate=90*PI/180" # Rotate with variable angle angle_rad = 45 * 3.14159 / 180 filter_desc = f"rotate={angle_rad}" **Brightness and contrast:** .. code-block:: python # Increase brightness by 0.1, contrast by 1.2 filter_desc = "eq=brightness=0.1:contrast=1.2" **Gaussian blur:** .. code-block:: python filter_desc = "gblur=sigma=2" **Random crop:** .. code-block:: python import random # Note: crop_x and crop_y are not supported by get_video_filter_desc. # Cropping is always center-based. For custom crop positions, # use get_filter_desc with a custom crop filter. x_pos = random.random() # 0.0 to 1.0 y_pos = random.random() # Center crop only (no positional control) filter_desc = spdl.io.get_video_filter_desc( scale_width=256, scale_height=256, crop_width=224, crop_height=224, pix_fmt="rgb24" ) Audio Filters ~~~~~~~~~~~~~ **Volume adjustment:** .. code-block:: python # Increase volume by 50% filter_desc = "volume=1.5" **High-pass filter:** .. code-block:: python # Remove frequencies below 200 Hz filter_desc = "highpass=f=200" **Tempo change:** .. code-block:: python # Speed up by 1.5x filter_desc = "atempo=1.5" Data Augmentation with Filters ------------------------------- Filters are particularly useful for data augmentation during training. Applying filters to decoded frames before buffer conversion can be more memory-efficient than applying the same operations after creating a contiguous array. Random Augmentation Example ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Filters can be generated with random parameters for data augmentation. The following example shows how to change augmentation values dynamically. .. code-block:: python import random import spdl.io def get_random_augmentation_filter(): """Generate a random augmentation filter for images.""" filters = [] # Random horizontal flip (50% chance) if random.random() < 0.5: filters.append("hflip") # Random vertical flip (50% chance) if random.random() < 0.5: filters.append("vflip") # Random rotation (-30 to +30 degrees) angle = (random.random() * 60 - 30) * 3.14159 / 180 filters.append(f"rotate={angle:.3f}") # Combine custom filters with standard preprocessing custom_filters = ",".join(filters) # Note: For custom crop positions, include crop filter in filter_desc # get_video_filter_desc only supports center cropping via crop_width/crop_height return spdl.io.get_video_filter_desc( filter_desc=custom_filters, scale_width=256, scale_height=256, crop_width=224, crop_height=224, pix_fmt="rgb24" ) def load_image_with_augmentation(image_path): """Load and augment an image.""" packets = spdl.io.demux_image(image_path) filter_desc = get_random_augmentation_filter() frames = spdl.io.decode_packets(packets, filter_desc=filter_desc) buffer = spdl.io.convert_frames(frames) return spdl.io.to_torch(buffer) # Use in training loop for image_path in training_images: augmented_image = load_image_with_augmentation(image_path) # augmented_image.shape: (224, 224, 3) The following is an example filter descriptions generated and the resulting images. .. code-block:: "hflip,rotate=angle=-0.05,scale=256:256,crop=224:224:x=0.18*(iw-ow):y=0.17*(ih-oh)" "hflip,vflip,rotate=angle=-0.37,scale=256:256,crop=224:224:x=0.09*(iw-ow):y=0.96*(ih-oh)" "rotate=angle=0.33,scale=256:256,crop=224:224:x=0.58*(iw-ow):y=0.57*(ih-oh)" "hflip,vflip,rotate=angle=0.30,scale=256:256,crop=224:224:x=0.80*(iw-ow):y=0.35*(ih-oh)" "hflip,vflip,rotate=angle=0.02,scale=256:256,crop=224:224:x=0.01*(iw-ow):y=0.25*(ih-oh)" "vflip,rotate=angle=0.35,scale=256:256,crop=224:224:x=0.42*(iw-ow):y=0.69*(ih-oh)" "hflip,rotate=angle=0.22,scale=256:256,crop=224:224:x=0.10*(iw-ow):y=0.03*(ih-oh)" "hflip,rotate=angle=-0.18,scale=256:256,crop=224:224:x=0.65*(iw-ow):y=0.31*(ih-oh)" "rotate=angle=-0.13,scale=256:256,crop=224:224:x=0.37*(iw-ow):y=0.75*(ih-oh)" "hflip,vflip,rotate=angle=0.01,scale=256:256,crop=224:224:x=0.27*(iw-ow):y=0.84*(ih-oh)" "hflip,rotate=angle=-0.31,scale=256:256,crop=224:224:x=0.43*(iw-ow):y=0.92*(ih-oh)" "hflip,rotate=angle=-0.27,scale=256:256,crop=224:224:x=0.96*(iw-ow):y=0.92*(ih-oh)" "vflip,rotate=angle=-0.28,scale=256:256,crop=224:224:x=0.61*(iw-ow):y=0.04*(ih-oh)" "hflip,vflip,rotate=angle=0.08,scale=256:256,crop=224:224:x=0.84*(iw-ow):y=0.57*(ih-oh)" "hflip,vflip,rotate=angle=0.41,scale=256:256,crop=224:224:x=0.24*(iw-ow):y=0.92*(ih-oh)" "hflip,rotate=angle=-0.02,scale=256:256,crop=224:224:x=0.47*(iw-ow):y=0.87*(ih-oh)" "hflip,rotate=angle=-0.15,scale=256:256,crop=224:224:x=0.73*(iw-ow):y=0.30*(ih-oh)" "vflip,rotate=angle=-0.13,scale=256:256,crop=224:224:x=0.91*(iw-ow):y=0.85*(ih-oh)" "vflip,rotate=angle=0.28,scale=256:256,crop=224:224:x=0.62*(iw-ow):y=0.02*(ih-oh)" "rotate=angle=0.24,scale=256:256,crop=224:224:x=0.85*(iw-ow):y=0.61*(ih-oh)" "vflip,rotate=angle=-0.52,scale=256:256,crop=224:224:x=0.61*(iw-ow):y=0.59*(ih-oh)" "vflip,rotate=angle=0.06,scale=256:256,crop=224:224:x=0.08*(iw-ow):y=0.04*(ih-oh)" "hflip,rotate=angle=0.50,scale=256:256,crop=224:224:x=0.23*(iw-ow):y=0.42*(ih-oh)" "vflip,rotate=angle=0.18,scale=256:256,crop=224:224:x=0.54*(iw-ow):y=0.34*(ih-oh) .. image:: ../_static/data/io_preprocessing_random_aug.png Benefits of Filters for Augmentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Performance benefits:** 1. **Fewer pixels to process**: Many video/image files use YUV420 format, which has half the pixels of RGB. Applying augmentation before RGB conversion processes fewer pixels. 2. **Single-pass processing**: Filters are applied during decoding in a single pass, avoiding multiple array operations. 3. **Memory efficiency**: No need to allocate intermediate arrays for each augmentation step. 4. **Hardware acceleration**: Some filters can use hardware acceleration when available. **Example comparison:** .. code-block:: python # Efficient: Augmentation during decoding (on YUV420 data) filter_desc = spdl.io.get_video_filter_desc( filter_desc="hflip,rotate=0.5,scale=256:256", crop_width=224, crop_height=224, pix_fmt="rgb24" # Convert to RGB at the end ) buffer = spdl.io.load_image("image.jpg", filter_desc=filter_desc) # Less efficient: Post-processing (on RGB24 data) buffer = spdl.io.load_image("image.jpg") # Decode to RGB first array = spdl.io.to_numpy(buffer) # Now apply transformations on the larger RGB array array = flip_horizontal(array) array = rotate(array, 0.5) array = resize(array, (256, 256)) array = crop(array, 224, 224)