neuralset.events.transforms.splitting.SklearnSplit¶
- class neuralset.events.transforms.splitting.SklearnSplit(*, infra: Backend | None = None, split_by: str = 'timeline', valid_split_ratio: Annotated[float, Strict(strict=True), Ge(ge=0.0), Le(le=1.0), _PydanticGeneralMetadata(allow_inf_nan=False)] = 0.2, test_split_ratio: Annotated[float, Strict(strict=True), Ge(ge=0.0), Le(le=1.0), _PydanticGeneralMetadata(allow_inf_nan=False)] = 0.2, valid_random_state: int = 33, test_random_state: int = 33, stratify_by: str | None = None)[source][source]¶
Perform train/val/test split using sklearn’s
train_test_split.- Parameters:
split_by (str) – Column name to use for splitting by (e.g., ‘timeline’ or ‘subject’). If set to “_index” and “_index” is not in the events dataframe, the events dataframe will be reset to have a new column with row indices, named “_index”.
valid_split_ratio (float) – Ratio of the full dataset to use for validation.
test_split_ratio (float) – Ratio of the full dataset to use for testing.
valid_random_state (int) – Random state for validation split.
test_random_state (int) – Random state for test split.
stratify_by (str | None) – Column name to use for stratified splitting. If None, no stratification is applied.