Running the full EEG benchmark¶

This page describes how to run the entire benchmark of EEG tasks end-to-end, including prerequisites, step-by-step instructions, model and dataset options, and computational resource requirements.

Prerequisites¶

SLURM cluster with GPUs (or a single machine with a GPU for --debug mode).
Disk space: ~3.3 TB for the raw datasets of the 35 base EEG tasks (~4.4 TB when including all multi-dataset variants via --dataset all). An additional ~35 GB is needed for the preprocessing cache.
Data access: a handful of datasets cannot be fetched automatically and must be obtained by registering, signing a license agreement, or filling out an application form. See Datasets requiring manual download below for the full list and step-by-step instructions.

Datasets requiring manual download¶

Most NeuralBench datasets are downloaded automatically by neuralbench <device> <task> --download. The following datasets are exceptions: they require a one-time manual step (creating an account, accepting a license, or submitting an application form) before they can be obtained. Tasks that depend on these datasets will be skipped with a warning during neuralbench eeg all --download.

Task(s)	Dataset / Study	Manual step required
`eeg/pathology`, `eeg/artifact`, `eeg/clinical_event`	TUH EEG Corpus (`Lopez2017Tuab`, `Hamid2020Tuar`, `Harati2015Tuev`)	Submit an access request to Temple University NEDC at https://isip.piconepress.com/projects/nedc/html/tuh_eeg/. Once granted, place the `tuh_eeg` tree under `DATA_DIR` so that the symlinked subfolders (`tuab`, `tuar`, `tuev`) are discoverable.
`eeg/image`, `meg/image`	THINGS-images database (shared by `Gifford2022Large` and `Hebart2023ThingsMeg`)	Read and accept the THINGS license at https://osf.io/jum2f/files/52wrx, retrieve the password from `password_images.txt`, and export it as `NEURALFETCH_THINGS_PASSWORD` (used to unzip the ~12 GB `images_THINGS.zip` archive).
`eeg/emotion`	FACED (`Chen2023Large`, hosted on Synapse)	Create a Synapse account at https://www.synapse.org/, accept the Synapse data-use terms, generate a Personal Access Token with View and Download scopes, and export it as `NEURALFETCH_SYNAPSE_TOKEN`.
`eeg/video`	SEED-DV (`Liu2024Eeg2video`)	Sign the BCMI Lab license agreement (https://cloud.bcmi.sjtu.edu.cn/sharing/o64PBIsIc), submit the application form at https://bcmi.sjtu.edu.cn/ApplicationForm/apply_form/ (select SEED-DV, use an institutional email), and download from the link sent by email after approval.
`fmri/image`	Natural Scenes Dataset (`Allen2022Nsd`)	Run `--download` from an interactive terminal: NeuralBench will display the NSD Terms and Conditions (https://cvnlab.slite.page/p/IB6BSeW_7o/Terms-and-Conditions), collect the user information required by the NSD Data Access Agreement, and submit the agreement form on your behalf. The AWS CLI must also be installed in order to `aws s3 sync` the dataset (~several TB).
`eeg/motor_imagery`, `eeg/mental_arithmetic`	Shin2017OpenA, Shin2017OpenB (via MOABB)	Read and accept the GNU General Public License v3 terms at http://doc.ml.tu-berlin.de/hBCI, then export the environment variable `MOABB_ACCEPT_LICENCE=1` to allow downloading.
`eeg/typing`, `meg/typing`	`Levy2025BrainEeg`, `Levy2025BrainMeg` (formerly `Pinet2024Eeg`/`Pinet2024Meg`)	Not yet available for public download. The release timeline is tracked alongside the upcoming dataset paper; until then these tasks remain a documented known limitation.
`eeg/speech`	Brennan2019 (`Brennan2019Hierarchical`) on Deep Blue Data	Auto-download via `urlretrieve` against `deepblue.lib.umich.edu/data/downloads/<id>` is currently blocked by Cloudflare’s managed challenge (HTTP 403; see `scratch/hubertjb/probe_brennan2019_url.sh` for the diagnostic probe). Until upstream restores anonymous HTTP access, fetch the v1 files manually via Globus: create a free Globus account (https://www.globus.org), open the dataset page at https://deepblue.lib.umich.edu/data/concern/data_sets/bg257f92t, click Files → Transfer files using Globus, and transfer the full `v1/` directory to `$DATA_DIR/Brennan2019Hierarchical/download/` (~4.2 GB, 56 files: 33 `S*.mat`, `audio.zip`, `proc.zip`, `AliceChapterOne-EEG.csv`, `README.txt`). Once present, re-running `neuralbench eeg speech --download` will skip the fetch step and proceed.

In addition, a few MOABB-backed datasets (e.g. Shin2017OpenA / Shin2017OpenB used by eeg/mental_arithmetic) require accepting a GPL-style click-through license. Set MOABB_ACCEPT_LICENCE=1 in your environment to acknowledge it before running --download.

Step 1: Download¶

Download all datasets for the 36 EEG tasks:

neuralbench eeg all --download

This triggers the download of all required studies to DATA_DIR. Tasks whose datasets require manual access (see Datasets requiring manual download above) will be skipped with a warning until the corresponding credentials, password, or data-use agreement are in place.

Step 2: Prepare the cache¶

Preprocess and cache the data for all tasks:

neuralbench eeg all --prepare

Each task is submitted as a SLURM job (or run locally with --debug) that loads the raw data, applies the preprocessing pipeline (resampling, filtering, scaling), and writes the result to CACHE_DIR. This step can be parallelized across tasks.

Step 3: Run the benchmark¶

Launch the full benchmark with the default model (EEGNet):

neuralbench eeg all

This submits 3 SLURM jobs per task (108 jobs total for all 36 tasks, one per seed). Each job trains, validates with early stopping, and evaluates the model.

Running with different models¶

Use -m to specify alternative models or model groups:

neuralbench eeg all -m eegconformer             # Single model
neuralbench eeg all -m all_classic               # All 8 task-specific models
neuralbench eeg all -m all_fm                    # All 6 foundation models
neuralbench eeg all -m all                       # All models

Task-specific models (8): shallow_fbcsp_net, simpleconv_time_agg, eegnet, deep4net, eegconformer, atcnet, bdtcn, ctnet

Foundation models (6): bendr, biot, cbramod, labram, luna, reve

Running with dataset variants¶

Nine tasks support multiple datasets. Use --dataset all to evaluate across all dataset variants for a task:

neuralbench eeg motor_imagery --dataset all      # Run on all 18 motor imagery datasets
neuralbench eeg p3 --dataset all                 # Run on all 24 P300 datasets

Visualizing results¶

Results can be visualized on Weights & Biases, or aggregated locally using --plot-cached:

neuralbench eeg all -m all_classic all_fm --plot-cached

--plot-cached does not re-train any model. It collects the stored test metrics from the cache and produces the following outputs in the outputs/ directory, split into three subfolders (core/ for the NeuralBench-EEG-Core v1.0 plots/tables (one dataset per task), full/ for the NeuralBench-EEG-Full v1.0 per-dataset breakdowns and variability analyses, and other/ for everything else):

Bar chart (outputs/core/core_bar_chart.png): faceted bar chart with one panel per task, one bar per model, including error bars and individual data points.
Results table (outputs/core/core_results_table.csv): wide-format table with mean +/- std per task and model.
Rank table (outputs/core/core_rank_table.csv): models ranked within each task (1 = best), with an average rank row at the bottom.

See the Visualizing Results tutorial for details on BenchmarkAggregator configuration and the available visualization methods.

Evaluation protocol¶

Each model is evaluated 3 times per task with different model seeds (33, 34, 35). The train/val/test data splits are fixed across all runs, so variance reflects model training stochasticity only. Reported metrics are mean +/- std over the 3 runs. See individual task pages for split details.

Computational considerations¶

NeuralBench is designed to run on a SLURM cluster with GPUs. For local development without SLURM, use --debug mode, which runs on a single GPU with a subsampled dataset (2 epochs, 5 batches per epoch).

Resource	Default / Requirement
GPU	1 x volta32gb (32 GB VRAM)
CPU RAM per job	64 GB
Raw datasets (35 base EEG tasks)	~3.3 TB
Preprocessing cache	~35 GB