neuralset.extractors.audio.MelSpectrum¶

pydantic model neuralset.extractors.audio.MelSpectrum[source][source]¶

Compute the Mel spectrogram representation of an audio waveform.

This feature extracts a Mel-scaled power spectrogram from raw waveform data, converting time-domain audio into a frequency-domain representation that emphasizes perceptually relevant frequency bands. The resulting tensor can optionally be log-scaled for improved numerical stability and interpretability.

Parameters:

n_mels (int, default=40) – Number of Mel filter banks to use when computing the Mel spectrogram.
n_fft (int, default=512) – Size of the FFT window used to compute the short-time Fourier transform (STFT).
hop_length (int or None, default=None) – Number of samples between successive frames. Defaults to n_fft // 4 if not set.
normalized (bool, default=True) – If True, normalize the spectrogram output.
use_log_scale (bool, default=True) – If True, apply a logarithmic transformation (base 10) to the Mel spectrum.
log_scale_eps (float, default=1e-5) – Small constant added to the Mel spectrum before taking the logarithm, to avoid numerical issues with log(0).

Fields:

hop_length (int | None)
log_scale_eps (float)
n_fft (int)
n_mels (int)
normalized (bool)
use_log_scale (bool)

field n_mels: int = 40[source]¶

field n_fft: int = 512[source]¶

field hop_length: int | None = None[source]¶

field normalized: bool = True[source]¶

field use_log_scale: bool = True[source]¶

field log_scale_eps: float = 1e-05[source]¶

requirements: ClassVar[tuple[str, ...]] = ('julius>=0.2.7', 'pillow>=9.2.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'torchaudio', 'soundfile')[source]¶

← Back to API reference