neuralset.extractors.audio.MelSpectrum¶
- pydantic model neuralset.extractors.audio.MelSpectrum[source][source]¶
Compute the Mel spectrogram representation of an audio waveform.
This feature extracts a Mel-scaled power spectrogram from raw waveform data, converting time-domain audio into a frequency-domain representation that emphasizes perceptually relevant frequency bands. The resulting tensor can optionally be log-scaled for improved numerical stability and interpretability.
- Parameters:
n_mels (int, default=40) – Number of Mel filter banks to use when computing the Mel spectrogram.
n_fft (int, default=512) – Size of the FFT window used to compute the short-time Fourier transform (STFT).
hop_length (int or None, default=None) – Number of samples between successive frames. Defaults to
n_fft // 4if not set.normalized (bool, default=True) – If True, normalize the spectrogram output.
use_log_scale (bool, default=True) – If True, apply a logarithmic transformation (base 10) to the Mel spectrum.
log_scale_eps (float, default=1e-5) – Small constant added to the Mel spectrum before taking the logarithm, to avoid numerical issues with log(0).
- Fields: