neuralset.extractors.audio.Wav2VecBert

pydantic model neuralset.extractors.audio.Wav2VecBert[source][source]

Extract speech embeddings using the pretrained Wav2Vec2-BERT model from Hugging Face.

Wav2Vec2-BERT is a self-supervised speech representation model that integrates Wav2Vec 2.0’s contrastive pretraining with a BERT-style masked language modeling objective. The model produces deep, contextualized audio embeddings suitable for a wide range of downstream speech and audio understanding tasks.

Parameters:

model_name (str) – The Hugging Face model identifier to load. Defaults to "facebook/w2v-bert-2.0".

Fields:
field model_name: str = 'facebook/w2v-bert-2.0'[source]
field hf_config: HuggingFaceAudioConfig = HuggingFaceAudioConfig(** { 'model_cls_name': 'Wav2Vec2BertModel',   'model_kwargs': None,   'processor_cls_name': 'AutoFeatureExtractor',   'processor_kwargs': None} )[source]
requirements: tp.ClassVar[tuple[str, ...]] = ('transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'julius>=0.2.7', 'pillow>=9.2.0', 'transformers>=4.29.2', 'soundfile', 'soundfile')[source]