Media Encoding ============== This section explains how to encode media data (audio and video) using SPDL. Encoding is the process of converting array data into compressed media formats like MP4, WAV, PNG, etc. .. note:: This section covers audio and video encoding. For image encoding, use the video encoding workflow with a single frame. Providing array data of one frame with proper configuration will save the data as an image file (e.g., PNG, JPEG). See :py:func:`spdl.io.save_image` for a convenient wrapper. The Encoding Process --------------------- Media encoding in SPDL follows a multi-stage process: 1. **Create Reference Frames**: Reinterpret array data as frame objects without copying 2. **Filter Frames** (Optional): Apply transformations like scaling or color correction 3. **Encode Frames**: Compress frames into packets 4. **Mux Packets**: Write packets to an output file The following diagram illustrates this process: .. mermaid:: flowchart LR a[Array Data] --> |create_reference_frame| f[Frame] f --> |Optional: FilterGraph| f2[Filtered Frame] f2 -.-> f f --> |Encoder| p[Packet] p --> |Muxer| file[Output File] Creating Reference Frames -------------------------- The first step in encoding is to create reference frames from your array data. This process reinterprets the contiguous array data into a format compatible with the encoding system **without copying the data**. The resulting frame objects hold metadata (format, dimensions, timestamps) and reference the original array's memory, making this operation very efficient. SPDL provides two functions for creating reference frames: Creating Audio Frames ~~~~~~~~~~~~~~~~~~~~~ Use :py:func:`spdl.io.create_reference_audio_frame` to create audio frames from array data: .. code-block:: python import numpy as np import spdl.io sample_rate = 44100 num_channels = 2 duration = 3 # Create audio data (3 seconds of stereo audio) shape = (sample_rate * duration, num_channels) audio_data = np.random.randint(-32768, 32767, size=shape, dtype=np.int16) # Create audio frame frames = spdl.io.create_reference_audio_frame( array=audio_data, sample_fmt="s16", # 16-bit signed integer sample_rate=sample_rate, pts=0, # Presentation timestamp ) For detailed parameter descriptions, see :py:func:`spdl.io.create_reference_audio_frame`. Creating Video Frames ~~~~~~~~~~~~~~~~~~~~~ Use :py:func:`spdl.io.create_reference_video_frame` to create video frames from array data: .. code-block:: python import numpy as np import spdl.io height, width = 240, 320 frame_rate = (30000, 1001) # ~29.97 fps num_frames = 90 # Create video data (90 frames of RGB video) shape = (num_frames, height, width, 3) video_data = np.random.randint(0, 255, size=shape, dtype=np.uint8) # Create video frames frames = spdl.io.create_reference_video_frame( array=video_data, pix_fmt="rgb24", frame_rate=frame_rate, pts=0, ) For detailed parameter descriptions, see :py:func:`spdl.io.create_reference_video_frame`. Using Encoders -------------- Encoders compress frame data into packets using codecs like H.264, AAC, or PCM. The encoding process applies compression algorithms to reduce file size while maintaining quality. Encoders are created through a :py:class:`spdl.io.Muxer` object and configured with parameters like bit rate, quality, and codec-specific settings. Audio Encoding ~~~~~~~~~~~~~~ Here's a complete example of encoding audio to a WAV file: .. code-block:: python import numpy as np import spdl.io sample_rate = 44100 duration = 3 num_channels = 2 # Create audio data shape = (sample_rate * duration, num_channels) audio_data = np.random.randint(-32768, 32767, size=shape, dtype=np.int16) # Create muxer and encoder muxer = spdl.io.Muxer("output.wav") encoder = muxer.add_encode_stream( config=spdl.io.audio_encode_config( num_channels=num_channels, sample_fmt="s16", sample_rate=sample_rate, ), encoder="pcm_s16le", # Optional: specify encoder ) # Encode and write with muxer.open(): # Create frames frames = spdl.io.create_reference_audio_frame( array=audio_data, sample_fmt="s16", sample_rate=sample_rate, pts=0, ) # Encode frames if (packets := encoder.encode(frames)) is not None: muxer.write(0, packets) # Flush encoder if (packets := encoder.flush()) is not None: muxer.write(0, packets) Video Encoding ~~~~~~~~~~~~~~ Here's a complete example of encoding video to an MP4 file: .. code-block:: python import numpy as np import spdl.io height, width = 240, 320 frame_rate = (30000, 1001) duration = 3 batch_size = 32 num_frames = int(frame_rate[0] / frame_rate[1] * duration) shape = (num_frames, height, width, 3) video_data = np.random.randint(0, 255, size=shape, dtype=np.uint8) # Create muxer and encoder muxer = spdl.io.Muxer("output.mp4") encoder = muxer.add_encode_stream( config=spdl.io.video_encode_config( height=height, width=width, pix_fmt="rgb24", frame_rate=frame_rate, ), ) # Encode and write in batches with muxer.open(): for start in range(0, num_frames, batch_size): # Create frames for this batch frames = spdl.io.create_reference_video_frame( array=video_data[start:start + batch_size, ...], pix_fmt="rgb24", frame_rate=frame_rate, pts=start, ) # Encode frames if (packets := encoder.encode(frames)) is not None: muxer.write(0, packets) # Flush encoder if (packets := encoder.flush()) is not None: muxer.write(0, packets) Using the Muxer --------------- The muxer is the final stage that writes encoded packets to an output file. It handles the container format (e.g., MP4, WAV, MKV) and ensures packets are properly interleaved and timestamped. The muxer can write multiple streams (audio, video, subtitles) into a single file. Basic Usage ~~~~~~~~~~~ .. code-block:: python import spdl.io # Create muxer for output file muxer = spdl.io.Muxer("output.mp4") # Add encoding stream(s) encoder = muxer.add_encode_stream( config=spdl.io.video_encode_config( height=240, width=320, pix_fmt="rgb24", frame_rate=(30, 1), ), ) # Open muxer and write data with muxer.open(): # ... encode and write packets muxer.write(0, packets) The muxer automatically flushes and closes when used as a context manager. Multiple Streams (Audio + Video) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can write both audio and video streams to a single file: .. code-block:: python import numpy as np import spdl.io # Create audio and video data audio_data = np.random.randint(-32768, 32767, size=(44100 * 3, 2), dtype=np.int16) video_data = np.random.randint(0, 255, size=(90, 240, 320, 3), dtype=np.uint8) # Create muxer with both audio and video streams muxer = spdl.io.Muxer("output.mp4") audio_encoder = muxer.add_encode_stream( config=spdl.io.audio_encode_config( num_channels=2, sample_rate=44100, sample_fmt="s16" ), encoder="aac", ) video_encoder = muxer.add_encode_stream( config=spdl.io.video_encode_config( height=240, width=320, pix_fmt="rgb24", frame_rate=(30, 1) ), ) with muxer.open(): # Write audio to stream 0 audio_frames = spdl.io.create_reference_audio_frame( array=audio_data, sample_fmt="s16", sample_rate=44100, pts=0 ) if (packets := audio_encoder.encode(audio_frames)) is not None: muxer.write(0, packets) # Write video to stream 1 video_frames = spdl.io.create_reference_video_frame( array=video_data, pix_fmt="rgb24", frame_rate=(30, 1), pts=0 ) if (packets := video_encoder.encode(video_frames)) is not None: muxer.write(1, packets) # Flush both encoders if (packets := audio_encoder.flush()) is not None: muxer.write(0, packets) if (packets := video_encoder.flush()) is not None: muxer.write(1, packets) Remuxing (Copying Streams) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can also remux (copy) streams without re-encoding: .. code-block:: python import spdl.io # Open source file demuxer = spdl.io.Demuxer("input.mp4") # Create output muxer muxer = spdl.io.Muxer("output.mp4") muxer.add_remux_stream(demuxer.video_codec) # Copy packets with muxer.open(): for packets in demuxer.streaming_demux(duration=1): muxer.write(0, packets) Customizing Encoders -------------------- SPDL provides configuration functions to customize encoding behavior. Video Encode Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use :py:func:`spdl.io.video_encode_config` to customize video encoding: .. code-block:: python import spdl.io config = spdl.io.video_encode_config( height=1080, width=1920, pix_fmt="yuv420p", frame_rate=(30, 1), bit_rate=5000000, # 5 Mbps gop_size=30, # GOP size max_b_frames=2, # Max B-frames compression_level=5, # Compression level colorspace="bt709", # Color space color_primaries="bt709", # Color primaries color_trc="bt709", # Transfer characteristics ) muxer = spdl.io.Muxer("output.mp4") encoder = muxer.add_encode_stream(config=config) For detailed parameter descriptions, see :py:func:`spdl.io.video_encode_config`. Audio Encode Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use :py:func:`spdl.io.audio_encode_config` to customize audio encoding: .. code-block:: python import spdl.io config = spdl.io.audio_encode_config( num_channels=2, sample_rate=48000, sample_fmt="fltp", bit_rate=192000, # 192 kbps compression_level=5, ) muxer = spdl.io.Muxer("output.aac") encoder = muxer.add_encode_stream(config=config) For detailed parameter descriptions, see :py:func:`spdl.io.audio_encode_config`. Applying Filters to Reference Frames ------------------------------------- You can apply filters to reference frames before encoding using :py:class:`spdl.io.FilterGraph`. This is useful for preprocessing (e.g., scaling, color correction, audio normalization). .. code-block:: python import numpy as np import spdl.io height, width = 240, 320 frame_rate = (30, 1) video_data = np.random.randint(0, 255, size=(90, height, width, 3), dtype=np.uint8) # Create reference frames frames = spdl.io.create_reference_video_frame( array=video_data, pix_fmt="rgb24", frame_rate=frame_rate, pts=0 ) # Apply scaling filter filter_desc = ( f"buffer=width={width}:height={height}:pix_fmt=rgb24:" f"time_base={frame_rate[1]}/{frame_rate[0]}:sar=1/1," "scale=640:480,buffersink" ) filter_graph = spdl.io.FilterGraph(filter_desc) filter_graph.add_frames(frames) filter_graph.flush() filtered_frames = filter_graph.get_frames() # Encode filtered frames muxer = spdl.io.Muxer("output.mp4") encoder = muxer.add_encode_stream( config=spdl.io.video_encode_config( height=480, width=640, pix_fmt="rgb24", frame_rate=frame_rate ) ) with muxer.open(): if (packets := encoder.encode(filtered_frames)) is not None: muxer.write(0, packets) if (packets := encoder.flush()) is not None: muxer.write(0, packets) For more details on filtering, including helper functions like :py:func:`spdl.io.get_buffer_desc` and :py:func:`spdl.io.get_abuffer_desc`, see :doc:`filtering` and :doc:`advanced_filtering`. See Also -------- - :doc:`filtering` - More details on using FilterGraph - :doc:`advanced_filtering` - Advanced filtering techniques - :doc:`decoding_overview` - Understanding the decoding process - :py:class:`spdl.io.Muxer` - Muxer API reference - :py:class:`spdl.io.AudioEncoder` - Audio encoder API reference - :py:class:`spdl.io.VideoEncoder` - Video encoder API reference - :py:func:`spdl.io.create_reference_audio_frame` - Create audio frames - :py:func:`spdl.io.create_reference_video_frame` - Create video frames - :py:func:`spdl.io.audio_encode_config` - Audio encoding configuration - :py:func:`spdl.io.video_encode_config` - Video encoding configuration