neuralbench.transforms.SimilaritySplit

pydantic model neuralbench.transforms.SimilaritySplit[source][source]

Perform train/val/test split based on similarity of sentence events.

Depending on the type of stimulus event that is expected, the behavior is as follows:

  • For Audio events, propagate sentence mapping to Word events, then chunk Audio events based on Word events.

  • For Keystroke events, propagate sentence mapping to Keystroke events.

  • For Sentence or Word events, directly apply the similarity-based split.

Parameters:

use_sklearn_split – If True, use sklearn’s train_test_split after computing clusters, rather than using SimilaritySplitter’s deterministic cluster assignment. NOTE: valid_random_state and test_random_state are ignored unless use_sklearn_split is True.

Fields:
field stim_event_type: Literal['Sentence', 'Word', 'Audio', 'Keystroke'] [Required][source]
field valid_split_ratio: float = 0.2[source]
field test_split_ratio: float = 0.2[source]
field valid_random_state: int = 33[source]
field test_random_state: int = 33[source]
field threshold: float = 0.2[source]
field use_sklearn_split: bool = False[source]
requirements: tp.ClassVar[tuple[str, ...]] = ()[source]