Project Example

neuraltrain-repo/project_example/ is the reference template for a small training project built on top of neuralset, neuraltrain, and PyTorch Lightning.

It uses the MNE sample dataset to train a 4-class MEG stimulus classifier, and it shows the expected structure of a real experiment package.

What It Includes

File

Role

main.py

Data and Experiment pydantic models

pl_module.py

BrainModule Lightning wrapper

grids/defaults.py

default config dict for a single local run

grids/run_grid.py

hyperparameter sweep via run_grid

plot_figure.py

sketch for loading and plotting saved outputs

Data

Data is a small pydantic model that holds a study chain and a Segmenter, then builds DataLoaders for each split:

class Data(pydantic.BaseModel):
    study: ns.Step
    segmenter: ns.dataloader.Segmenter
    batch_size: int = 64
    num_workers: int = 0

    def build(self) -> dict[str, DataLoader]:
        events = self.study.run()
        dataset = self.segmenter.apply(events)
        dataset.prepare()

        loaders = {}
        for split, shuffle in [("train", True), ("val", False), ("test", False)]:
            ds = dataset.select(dataset.triggers["split"] == split)
            loaders[split] = DataLoader(
                ds, collate_fn=ds.collate_fn,
                batch_size=self.batch_size, shuffle=shuffle,
                num_workers=self.num_workers,
            )
        return loaders

The study chain in the default config is Mne2013Sample piped into SklearnSplit, which adds a split column to the events table. The Segmenter holds the extractors (MegExtractor for input, EventField for target) and the segment window.

Experiment

Experiment is a plain pydantic.BaseModel that collects all training pieces and orchestrates the run:

class Experiment(pydantic.BaseModel):
    data: Data
    brain_model_config: BaseModelConfig
    loss: BaseLoss
    optim: LightningOptimizer
    metrics: list[BaseMetric]
    csv_config: CsvLoggerConfig | None = None
    wandb_config: WandbLoggerConfig | None = None
    infra: TaskInfra = TaskInfra(version="1")

    @infra.apply
    def run(self) -> dict[str, float | None]:
        loaders = self.data.build()
        brain_module = self._build_brain_module(loaders["train"])
        trainer = self._setup_trainer()
        if not self.test_only:
            self.fit(brain_module, trainer, loaders["train"], loaders["val"])
        return self.test(brain_module, trainer, loaders["test"])

Key design choices:

  • Stateless: brain_module and trainer are local variables passed between methods, not stored as private attributes.

  • TaskInfra (from exca): controls folder, cluster, GPU count, and caching. @infra.apply makes run() cacheable and Slurm-submittable.

  • Configurable logging: csv_config and wandb_config are optional; when both are None, a DummyLogger is used.

  • Returns results: run() returns a dict[str, float | None] of test metrics instead of the trainer object.

Run a Local Example

From the repository root:

cd neuraltrain-repo
python -m project_example.grids.defaults

This runs a local experiment with the default configuration.

Run a Grid

cd neuraltrain-repo
python -m project_example.grids.run_grid

This launches the example sweep and shows how neuraltrain.utils.run_grid() is used to expand experiment configurations over depth, epoch count, and seed.

Why This Example Matters

The example is the clearest starting point if you want to:

  • wire a neuralset study chain into training via Segmenter

  • keep models, metrics, and optimizers as typed config objects

  • add CSV or W&B logging through config fields

  • launch Slurm jobs or grid sweeps through exca.TaskInfra

  • organize experiments as reusable, stateless pydantic classes