neuralbench.cli.run_benchmark¶

neuralbench.cli.run_benchmark(device: str, task: str | list[str], *, model: str | list[str] | None = None, dataset: str | list[str] | None = None, checkpoint: str | None = None, downstream_wrapper: str | list[str] | None = None, grid: bool = False, debug: bool = False, force: bool = False, retry: bool = False, prepare: bool = False, download: bool = False, plot_cached: bool = False) → list[dict[str, Any]][source][source]¶

Run one or more NeuralBench experiments from Python.

This is the programmatic equivalent of the neuralbench CLI. It assembles experiment configs from the same YAML files and returns test-metric dictionaries when running in debug mode.

Parameters:

device (str) – Brain recording device ("eeg", "meg", "fmri", …).
task (str or list of str) – Task name(s), "all", or "all_multi_dataset".
model (str or list of str or None) – Predefined model name(s), "all", "all_classic", "all_fm", "all_baseline" (chance / dummy / classical sklearn pipelines), or None (uses default model from config.yaml).
dataset (str or list of str or None) – Dataset variant(s) or "all". None uses the base config.
checkpoint (str or None) – Path to a model checkpoint to reload.
downstream_wrapper (str or list of str or None) – Downstream wrapper name(s) or "all".
grid (bool) – Expand the task-specific hyperparameter grid.
debug (bool) – Run locally with a reduced config (2 epochs, 5 batches).
force (bool) – Force re-running experiments.
retry (bool) – Retry failed experiments while keeping completed results.
prepare (bool) – Run a single experiment to warm the preprocessing cache.
download (bool) – Only download the dataset; do not run experiments.
plot_cached (bool) – Generate plots and tables from cached results only, without running any new experiments.

Returns:

One result dict per experiment (empty when experiments are submitted asynchronously via Slurm).

Return type:

list of dict

← Back to API reference