neuralset.events.transforms.splitting.SklearnSplit

class neuralset.events.transforms.splitting.SklearnSplit(*, infra: Backend | None = None, split_by: str = 'timeline', valid_split_ratio: Annotated[float, Strict(strict=True), Ge(ge=0.0), Le(le=1.0), _PydanticGeneralMetadata(allow_inf_nan=False)] = 0.2, test_split_ratio: Annotated[float, Strict(strict=True), Ge(ge=0.0), Le(le=1.0), _PydanticGeneralMetadata(allow_inf_nan=False)] = 0.2, valid_random_state: int = 33, test_random_state: int = 33, stratify_by: str | None = None)[source][source]

Perform train/val/test split using sklearn’s train_test_split.

Parameters:
  • split_by (str) – Column name to use for splitting by (e.g., ‘timeline’ or ‘subject’). If set to “_index” and “_index” is not in the events dataframe, the events dataframe will be reset to have a new column with row indices, named “_index”.

  • valid_split_ratio (float) – Ratio of the full dataset to use for validation.

  • test_split_ratio (float) – Ratio of the full dataset to use for testing.

  • valid_random_state (int) – Random state for validation split.

  • test_random_state (int) – Random state for test split.

  • stratify_by (str | None) – Column name to use for stratified splitting. If None, no stratification is applied.