neuralset.events.transforms.text.AddSentenceToWords

class neuralset.events.transforms.text.AddSentenceToWords(*, infra: Backend | None = None, max_unmatched_ratio: float = 0.0, override_sentences: bool = False)[source][source]

Adds sentence-level information to word events based on Text rows.

This transform processes a DataFrame containing word-level (Word) and text-level (Text) events. For each sentence found in the Text rows, it:

  1. Creates a new Sentence row for each sentence.

  2. Assigns sentence and sentence_char annotations to Word rows to indicate which sentence each word belongs to, and which character the word starts at in the sentence.

Parameters:
  • max_unmatched_ratio (float) – Maximum allowed ratio of word rows that do not match any sentence. Raises an error if this ratio is exceeded.

  • override_sentences (bool, default=False) – Whether to replace existing Sentence rows if they are already present.