neuralset.extractors.audio.SonarAudio

pydantic model neuralset.extractors.audio.SonarAudio[source][source]

Extract deep audio embeddings from waveforms using the Sonar speech encoder.

SONAR stands for Sentence-level multimOdal and laNguage-Agnostic Representations

This extractor leverages the sonar_speech_encoder_eng model to produce speech sentence embeddings.

Parameters:
  • sampling_rate (int, default=16_000) – The input sampling rate expected by the Sonar model.

  • layer (float, default=0.5) – The relative layer from which to extract the embedding (0=first layer, 1.= last layer).

Fields:
requirements: ClassVar[tuple[str, ...]] = ('julius>=0.2.7', 'pillow>=9.2.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'sonar-space', 'fairseq2', 'soundfile')[source]
field sampling_rate: int = 16000[source]
field layer: float = 0.5[source]
property model: Module[source]