neuralset.extractors.audio.SonarAudio¶

pydantic model neuralset.extractors.audio.SonarAudio[source][source]¶

Extract deep audio embeddings from waveforms using the Sonar speech encoder.

SONAR stands for Sentence-level multimOdal and laNguage-Agnostic Representations

This extractor leverages the sonar_speech_encoder_eng model to produce speech sentence embeddings.

Parameters:

sampling_rate (int, default=16_000) – The input sampling rate expected by the Sonar model.
layer (float, default=0.5) – The relative layer from which to extract the embedding (0=first layer, 1.= last layer).

Fields:

requirements: ClassVar[tuple[str, ...]] = ('julius>=0.2.7', 'pillow>=9.2.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'sonar-space', 'fairseq2', 'soundfile')[source]¶