neuralset.extractors.image.HuggingFaceImage

pydantic model neuralset.extractors.image.HuggingFaceImage[source][source]

Compute image embeddings using transformer-based models obtained through HuggingFace API.

Parameters:
  • model_name (str, default="facebook/dinov2-base") – HuggingFace model identifier.

  • pretrained (bool, default=True) – Whether to load pretrained weights from model. If False, initializes the model with random weights from the model configuration.

Fields:
field model_name: str = 'facebook/dinov2-base'[source]
field pretrained: bool = True[source]
field infra: MapInfra = MapInfra(folder=None, cluster=None, logs='{folder}/logs/{user}/%j', job_name=None, timeout_min=25, nodes=1, tasks_per_node=1, cpus_per_task=8, gpus_per_node=1, mem_gb=None, max_pickle_size_gb=None, slurm_constraint=None, slurm_partition=None, slurm_account=None, slurm_qos=None, slurm_use_srun=False, slurm_additional_parameters=None, slurm_setup=None, conda_env=None, workdir=None, permissions=511, version='v5', keep_in_ram=True, max_jobs=128, min_samples_per_job=4096, forbid_single_item_computation=False, mode='cached')[source]
property model: Module[source]
get_static(event: Image) Tensor[source][source]

Return a single feature vector for the given event.

Override this method in subclasses to produce a static (non-temporal) embedding for one event. The returned tensor should have no time dimension — temporal wrapping is handled by BaseStatic automatically.

Parameters:

event (Event) – The event to extract a feature from.

Returns:

A tensor of shape (*feature_shape,) (no time axis).

Return type:

torch.Tensor

requirements: tp.ClassVar[tuple[str, ...]] = ('transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'torchvision>=0.15.2', 'transformers>=4.29.2', 'pillow>=9.2.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'torchvision>=0.15.2', 'transformers>=4.29.2', 'pillow>=9.2.0', 'pillow>=9.2.0')[source]