neuralbench.transforms.SimilaritySplit¶

pydantic model neuralbench.transforms.SimilaritySplit[source][source]¶

Perform train/val/test split based on similarity of sentence events.

Depending on the type of stimulus event that is expected, the behavior is as follows:

For Audio events, propagate sentence mapping to Word events, then chunk Audio events based on Word events.
For Keystroke events, propagate sentence mapping to Keystroke events.
For Sentence or Word events, directly apply the similarity-based split.

Parameters:

use_sklearn_split – If True, use sklearn’s train_test_split after computing clusters, rather than using SimilaritySplitter’s deterministic cluster assignment. NOTE: valid_random_state and test_random_state are ignored unless use_sklearn_split is True.

Fields:

stim_event_type (Literal['Sentence', 'Word', 'Audio', 'Keystroke'])
test_random_state (int)
test_split_ratio (float)
threshold (float)
use_sklearn_split (bool)
valid_random_state (int)
valid_split_ratio (float)

field stim_event_type: Literal['Sentence', 'Word', 'Audio', 'Keystroke'] [Required][source]¶

field valid_split_ratio: float = 0.2[source]¶

field test_split_ratio: float = 0.2[source]¶

field valid_random_state: int = 33[source]¶

field test_random_state: int = 33[source]¶

field threshold: float = 0.2[source]¶

field use_sklearn_split: bool = False[source]¶

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]¶

← Back to API reference