neuralset.events.transforms.text.AddConcatenationContext¶

pydantic model neuralset.events.transforms.text.AddConcatenationContext[source][source]¶

Adds contextual information to events by concatenating previous events of the same type.

Warning

Unstable API — the context representation will be replaced with compact indices in a future release.

This transform iterates over events of a specified type (default “Word”) and creates a context column in the DataFrame. For each event, the context consists of the concatenated texts of all previous events in the same chunk, where chunks are determined by timeline changes, split changes, or sentence boundaries (if sentence_only=True). Optionally, the context length can be limited by max_context_len.

Note

if an event is missing (eg. a previous word event) it will be missing in the context. Use AddContextToWords for a more careful consideration of context, and the addition of punctuation.

Parameters:

event_type (str, default="Word") – Type of event to use for building context.
sentence_only (bool, default=False) – If True, chunks are defined by sentence boundaries; otherwise, by timeline and split changes.
max_context_len (int | None, default=None) – Maximum number of previous events to include in the context. If None, all previous events in the chunk are used.
split_field (str, default="split") – Column name used to detect split boundaries when creating chunks.

Fields:

event_type (str)
max_context_len (int | None)
sentence_only (bool)
split_field (str)

field event_type: str = 'Word'[source]¶

field sentence_only: bool = False[source]¶

field max_context_len: int | None = None[source]¶

field split_field: str = 'split'[source]¶

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]¶

← Back to API reference