neuralset

neuralset

neuralset turns raw neural recordings and stimuli into PyTorch-ready datasets. Define a study, load events into a flat DataFrame, apply transforms, extract features with configurable extractors, and segment everything into batches — all lazy, typed with pydantic, and cacheable.

pip install neuralset

Add the tutorials extra for feature-extraction dependencies, as well as neuralfetch to experiment with curated public datasets:

pip install 'neuralset[tutorials]' neuralfetch

Heavier dependencies (e.g. transformers for text/image/etc feature extraction) live in [all] — see Installation for the full breakdown.


Quickstart

From study to PyTorch batch — pick an example to see the code.

Load study data, configure extractors & segment

Tutorials

Each tutorial walks through one building block of the pipeline.

Events
The universal data format — every recording, stimulus, and annotation is an event.
from neuralset.events import Event
evt = Event(type="Word", start=1.0,
            duration=0.3, timeline="sub-01")
print(evt)              # pydantic model
print(evt.model_dump()) # dict
Studies
Download datasets and load them as events DataFrames.
study = ns.Study(name="Fake2025Meg",
                 path=ns.CACHE_FOLDER)
events = study.run()
print(f"{len(events)} events, "
      f"{events['subject'].nunique()} subjects")
Transforms
Filter, split, and enrich events before extraction.
import neuralset as ns
transform = ns.events.transforms.AddSentenceToWords()
events = transform(events)
print(events[events.type == "Sentence"].head())
Extractors
Turn events into tensors — brain signals, text embeddings, images, labels.
meg = ns.extractors.MegExtractor(frequency=100.0)
sample = meg(events, start=0.0, duration=1.0)
print(f"MEG shape: {sample.shape}")
Segmenter & Dataset
Create time-locked segments and iterate with a PyTorch DataLoader.
segmenter = ns.dataloader.Segmenter(
    start=-0.1, duration=0.5,
    trigger_query='type=="Word"',
    extractors=dict(meg=meg),
    drop_incomplete=True)
dataset = segmenter.apply(events)
loader = DataLoader(dataset, batch_size=8,
                    collate_fn=dataset.collate_fn)
Chains
Compose Study + Transforms into reproducible, cacheable pipelines.
chain = ns.Chain(steps=[
    ns.Study(name="Fake2025Meg",
             path=ns.CACHE_FOLDER),
    transforms.AddSentenceToWords(),
])
events = chain.run()

Citation

@misc{king2026neuralset,
  title  = {NeuralSet: A High-Performing Python Package for Neuro-AI},
  author = {King, Jean-R{\'e}mi and Bel, Corentin and Evanson, Linnea
            and Gadonneix, Julien and Houhamdi, Sophia and L{\'e}vy, Jarod
            and Raugel, Josephine and Santos Revilla, Andrea
            and Zhang, Mingfang and Bonnaire, Julie and Caucheteux, Charlotte
            and D{\'e}fossez, Alexandre and Desbordes, Th{\'e}o
            and Diego-Sim{\'o}n, Pablo and Khanna, Shubh and Millet, Juliette
            and Orhan, Pierre and Panchavati, Saarang and Ratouchniak, Antoine
            and Thual, Alexis and Brooks, Teon L. and Begany, Katelyn
            and Benchetrit, Yohann and Careil, Marl{\`e}ne and Banville, Hubert
            and d'Ascoli, St{\'e}phane and Dahan, Simon and Rapin, J{\'e}r{\'e}my},
  year   = {2026},
  url    = {https://kingjr.github.io/files/neuralset.pdf},
  note   = {Preprint; URL will be updated when the paper lands on arXiv}
}