neuralset.extractors.text.TfidfEmbedding

pydantic model neuralset.extractors.text.TfidfEmbedding[source][source]

Get TF-IDF embeddings for Sentence events.

Fields:
field event_types: str | tuple[str, ...] = 'Sentence'[source]
field max_features: int = 5000[source]
property vectorizer: Any[source]
prepare(obj: DataFrame | Sequence[Event] | Sequence[Segment]) None[source][source]

Pre-compute and cache extractor data for a collection of events.

This method triggers _get_data on every matching event so that expensive computation (e.g. model inference) is done once and cached. It then calls the extractor on a single event to populate the output shape, which is needed when allow_missing=True.

Call prepare before using the extractor in a dataloader.

Parameters:

obj (DataFrame or sequence of Event or sequence of Segment) – The structure containing the events. When calling prepare on several objects, prefer passing a list of events or segments over a DataFrame to avoid redundant conversion overhead.

get_embedding(text: str, language: str = '') ndarray[source][source]
requirements: tp.ClassVar[tuple[str, ...]] = ('rapidfuzz', 'rapidfuzz')[source]