neuralset.extractors.audio.Wav2Vec

pydantic model neuralset.extractors.audio.Wav2Vec[source][source]

Extract speech embeddings using a pretrained Wav2Vec 2.0 model from Hugging Face.

The Wav2Vec 2.0 architecture learns contextualized speech representations from raw audio waveforms using self-supervised pretraining on large multilingual audio corpora, and is widely used for tasks such as automatic speech recognition (ASR), speaker verification, and speech classification.

Parameters:

model_name (str) – The Hugging Face model identifier to load, defaulting to "facebook/wav2vec2-large-xlsr-53".

Fields:
field model_name: str = 'facebook/wav2vec2-large-xlsr-53'[source]
requirements: tp.ClassVar[tuple[str, ...]] = ('transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'soundfile')[source]