neuralset.extractors.meta.HuggingFacePCA

pydantic model neuralset.extractors.meta.HuggingFacePCA[source][source]

Applies a PCA to the underlying HuggingFace extractor. The underlying extractor is first computed through the prepare method, and then the current extractor applies the PCA on it. Compared to the ExtractorPCA extractor, HuggingFacePCA handles caching of multiple layers at once in the cache. By default, the hugging face extractor cache is deleted afterwards.

Parameters:
  • extractor (HuggingFace Extractor) – the underlying extractor on which the PCA must be applied

  • n_components (int) – the number of components of the PCA

  • whiten (bool) – whether the whiten post PCA

  • use_tmp_cache (bool) – whether to use a temporary cache folder for the underlying extractor that gets deleted afterwards

Fields:
field extractor: BaseExtractor [Required][source]
field use_tmp_cache: bool = True[source]
prepare(obj: Any) None[source][source]

Pre-compute and cache extractor data for a collection of events.

This method triggers _get_data on every matching event so that expensive computation (e.g. model inference) is done once and cached. It then calls the extractor on a single event to populate the output shape, which is needed when allow_missing=True.

Call prepare before using the extractor in a dataloader.

Parameters:

obj (DataFrame or sequence of Event or sequence of Segment) – The structure containing the events. When calling prepare on several objects, prefer passing a list of events or segments over a DataFrame to avoid redundant conversion overhead.

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]