neuralbench.cli.run_benchmark

neuralbench.cli.run_benchmark(device: str, task: str | list[str], *, model: str | list[str] | None = None, dataset: str | list[str] | None = None, checkpoint: str | None = None, downstream_wrapper: str | list[str] | None = None, grid: bool = False, debug: bool = False, force: bool = False, retry: bool = False, prepare: bool = False, download: bool = False, plot_cached: bool = False) list[dict[str, Any]][source][source]

Run one or more NeuralBench experiments from Python.

This is the programmatic equivalent of the neuralbench CLI. It assembles experiment configs from the same YAML files and returns test-metric dictionaries when running in debug mode.

Parameters:
  • device (str) – Brain recording device ("eeg", "meg", "fmri", …).

  • task (str or list of str) – Task name(s), "all", or "all_multi_dataset".

  • model (str or list of str or None) – Predefined model name(s), "all", "all_classic", "all_fm", "all_baseline" (chance / dummy / classical sklearn pipelines), or None (uses default model from config.yaml).

  • dataset (str or list of str or None) – Dataset variant(s) or "all". None uses the base config.

  • checkpoint (str or None) – Path to a model checkpoint to reload.

  • downstream_wrapper (str or list of str or None) – Downstream wrapper name(s) or "all".

  • grid (bool) – Expand the task-specific hyperparameter grid.

  • debug (bool) – Run locally with a reduced config (2 epochs, 5 batches).

  • force (bool) – Force re-running experiments.

  • retry (bool) – Retry failed experiments while keeping completed results.

  • prepare (bool) – Run a single experiment to warm the preprocessing cache.

  • download (bool) – Only download the dataset; do not run experiments.

  • plot_cached (bool) – Generate plots and tables from cached results only, without running any new experiments.

Returns:

One result dict per experiment (empty when experiments are submitted asynchronously via Slurm).

Return type:

list of dict