neuralset.extractors.audio.Whisper

pydantic model neuralset.extractors.audio.Whisper[source][source]

Extract speech embeddings using the pretrained Whisper model from Hugging Face.

Whisper is a multilingual speech recognition and translation model that includes a dedicated encoder for audio processing. This class provides an interface to convert raw audio waveforms into high-level embeddings suitable for automatic speech recognition (ASR), speech translation, and other downstream tasks.

model_name[source]

The Hugging Face model identifier to load. Defaults to "openai/whisper-large-v3-turbo".

Type:

str

Fields:
field model_name: str = 'openai/whisper-large-v3-turbo'[source]
field dtype: Literal['float32'] = 'float32'[source]
field hf_config: HuggingFaceAudioConfig = HuggingFaceAudioConfig(** { 'model_cls_name': 'WhisperModel',   'model_kwargs': None,   'processor_cls_name': 'AutoFeatureExtractor',   'processor_kwargs': None} )[source]
load_model() Module[source][source]
requirements: tp.ClassVar[tuple[str, ...]] = ('transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'soundfile')[source]