neuralset.extractors.audio.MelSpectrum

class neuralset.extractors.audio.MelSpectrum(*, event_types: str | tuple[str, ...] = 'Audio', aggregation: Literal['single', 'sum', 'mean', 'first', 'middle', 'last', 'cat', 'stack', 'trigger'] = 'single', allow_missing: bool = False, frequency: Literal['native'] | float = 'native', norm_audio: bool = True, infra: MapInfra = MapInfra(folder=None, cluster=None, logs='{folder}/logs/{user}/%j', job_name=None, timeout_min=25, nodes=1, tasks_per_node=1, cpus_per_task=8, gpus_per_node=1, mem_gb=None, max_pickle_size_gb=None, slurm_constraint=None, slurm_partition=None, slurm_account=None, slurm_qos=None, slurm_use_srun=False, slurm_additional_parameters=None, conda_env=None, workdir=None, permissions=511, version='v5', keep_in_ram=True, max_jobs=128, min_samples_per_job=4096, forbid_single_item_computation=False, mode='cached'), n_mels: int = 40, n_fft: int = 512, hop_length: int | None = None, normalized: bool = True, use_log_scale: bool = True, log_scale_eps: float = 1e-05)[source][source]

Compute the Mel spectrogram representation of an audio waveform.

This feature extracts a Mel-scaled power spectrogram from raw waveform data, converting time-domain audio into a frequency-domain representation that emphasizes perceptually relevant frequency bands. The resulting tensor can optionally be log-scaled for improved numerical stability and interpretability.

Parameters:
  • n_mels (int, default=40) – Number of Mel filter banks to use when computing the Mel spectrogram.

  • n_fft (int, default=512) – Size of the FFT window used to compute the short-time Fourier transform (STFT).

  • hop_length (int or None, default=None) – Number of samples between successive frames. Defaults to n_fft // 4 if not set.

  • normalized (bool, default=True) – If True, normalize the spectrogram output.

  • use_log_scale (bool, default=True) – If True, apply a logarithmic transformation (base 10) to the Mel spectrum.

  • log_scale_eps (float, default=1e-5) – Small constant added to the Mel spectrum before taking the logarithm, to avoid numerical issues with log(0).