neuralset.extractors.image.HuggingFaceImage

pydantic model neuralset.extractors.image.HuggingFaceImage[source][source]

Compute image embeddings using transformer-based models obtained through HuggingFace API.

Parameters:

model_name (str, default="facebook/dinov2-base") – HuggingFace model identifier.

Fields:
field event_types: Literal['Image', 'Video'] = 'Image'[source]
requirements: ClassVar[tuple[str, ...]] = ('transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'transformers>=4.29.2', 'huggingface_hub>=0.27.0', 'torchvision>=0.15.2', 'transformers>=4.29.2', 'pillow>=9.2.0', 'pillow>=9.2.0')[source]
field model_name: str = 'facebook/dinov2-base'[source]
field hf_config: HuggingFaceImageConfig = HuggingFaceImageConfig(** { 'model_cls_name': 'AutoModel',   'model_kwargs': None,   'processor_cls_name': 'AutoProcessor',   'processor_kwargs': {'do_rescale': False}} )[source]
field infra: MapInfra = MapInfra(folder=None, cluster=None, logs='{folder}/logs/{user}/%j', job_name=None, timeout_min=25, nodes=1, tasks_per_node=1, cpus_per_task=8, gpus_per_node=1, mem_gb=None, max_pickle_size_gb=None, slurm_constraint=None, slurm_partition=None, slurm_account=None, slurm_qos=None, slurm_use_srun=False, slurm_additional_parameters=None, slurm_setup=None, conda_env=None, workdir=None, permissions=511, version='v6', keep_in_ram=True, max_jobs=128, min_samples_per_job=4096, forbid_single_item_computation=False, mode='cached')[source]
field batch_size: int = 32[source]
field imsize: int | None = None[source]
field frequency: float | Literal['native'] = 0.0[source]
get_static(event: Image) Tensor[source][source]

Return a single feature vector for the given event.

Override this method in subclasses to produce a static (non-temporal) embedding for one event. The returned tensor should have no time dimension — temporal wrapping is handled by BaseStatic automatically.

Parameters:

event (Event) – The event to extract a feature from.

Returns:

A tensor of shape (*feature_shape,) (no time axis).

Return type:

torch.Tensor