neuralset

neuralset

neuralset turns raw neural recordings and stimuli into PyTorch-ready datasets. Define a study, load events into a flat DataFrame, apply transforms, extract features with configurable extractors, and segment everything into batches — all lazy, typed with pydantic, and cacheable.

pip install neuralset

The base package includes the full pipeline — events, transforms, extractors, segmenters, and dataloaders. Some extractors require optional dependencies (e.g., transformers, torchaudio) which can be installed individually or all at once:

pip install 'neuralset[all]'

To access the curated catalog of public brain datasets, also install neuralfetch:

pip install neuralfetch

Quickstart

From study to PyTorch batch — pick an example to see the code.

Load study data, configure extractors & segment

Tutorials

Each tutorial walks through one building block of the pipeline.

Events
The universal data format — every recording, stimulus, and annotation is an event.
from neuralset.events import Event
evt = Event(type="Word", start=1.0,
            duration=0.3, timeline="sub-01")
print(evt)              # pydantic model
print(evt.model_dump()) # dict
Studies
Download datasets and load them as events DataFrames.
study = ns.Study(name="Fake2025Meg",
                 path=ns.CACHE_FOLDER)
events = study.run()
print(f"{len(events)} events, "
      f"{events['subject'].nunique()} subjects")
Transforms
Filter, split, and enrich events before extraction.
import neuralset as ns
transform = ns.events.transforms.AddSentenceToWords()
events = transform(events)
print(events[events.type == "Sentence"].head())
Extractors
Turn events into tensors — brain signals, text embeddings, images, labels.
meg = ns.extractors.MegExtractor(frequency=100.0)
sample = meg(events, start=0.0, duration=1.0)
print(f"MEG shape: {sample.shape}")
Segmenter & Dataset
Create time-locked segments and iterate with a PyTorch DataLoader.
segmenter = ns.dataloader.Segmenter(
    start=-0.1, duration=0.5,
    trigger_query='type=="Word"',
    extractors=dict(meg=meg),
    drop_incomplete=True)
dataset = segmenter.apply(events)
loader = DataLoader(dataset, batch_size=8,
                    collate_fn=dataset.collate_fn)
Chains
Compose Study + Transforms into reproducible, cacheable pipelines.
chain = ns.Chain(steps=[
    ns.Study(name="Fake2025Meg",
             path=ns.CACHE_FOLDER),
    transforms.AddSentenceToWords(),
])
events = chain.run()