neuralbench.transforms.TextPreprocessor

pydantic model neuralbench.transforms.TextPreprocessor[source][source]

Clean and filter text-related events.

The following operations are applied to the events:

  • Keep only events with duration >= 0

  • Keep only neuro events, Audio, or valid Word events (with text as string)

  • Clean ‘text’ column by removing special characters and lowercasing

  • Drop empty or blank text entries

  • For Nieuwland2018, group similar sentences together to avoid leakage (each sentence has two very similar versions)

Fields:
field neuro_event_type: str = 'Eeg'[source]
static clean_text(x: str) str[source][source]

Remove special characters and lowercase the text.

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]