neuralset.extractors.text.TfidfEmbedding¶

pydantic model neuralset.extractors.text.TfidfEmbedding[source][source]¶

Get TF-IDF embeddings for Sentence events.

Fields:

event_types (str | tuple[str, ...])
max_features (int)

field event_types: str | tuple[str, ...] = 'Sentence'[source]¶

field max_features: int = 5000[source]¶

property vectorizer: Any[source]¶

prepare(obj: DataFrame | Sequence[Event] | Sequence[Segment]) → None[source][source]¶

Pre-compute and cache extractor data for a collection of events.

This method triggers _get_data on every matching event so that expensive computation (e.g. model inference) is done once and cached. It then calls the extractor on a single event to populate the output shape, which is needed when allow_missing=True.

Call prepare before using the extractor in a dataloader.

Parameters:: obj (DataFrame or sequence of Event or sequence of Segment) – The structure containing the events. When calling prepare on several objects, prefer passing a list of events or segments over a DataFrame to avoid redundant conversion overhead.

get_embedding(text: str, language: str = '') → ndarray[source][source]¶

requirements: tp.ClassVar[tuple[str, ...]] = ('rapidfuzz', 'rapidfuzz')[source]¶

← Back to API reference