Word decoding¶
Nieuwland2018Usage¶
neuralbench eeg word
Show config.yaml
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
data:
study:
source:
name: Nieuwland2018Large
query: "site not in ['GLAS', 'LOND']"
preprocess_text:
name: TextPreprocessor
neuro_event_type: Eeg
split:
name: SklearnSplit
split_by: sequence_id
valid_split_ratio: 0.1
test_split_ratio: 0.1
valid_random_state: 33
test_random_state: 33
target:
name: SpacyEmbedding
aggregation: trigger
infra:
cluster: auto
keep_in_ram: true
timeout_min: 180
gpus_per_node: 1
cpus_per_task: 10
min_samples_per_job: 16
trigger_event_type: Word
start: -0.5
duration: 3.0
summary_columns: [text, sequence_id]
brain_model_output_size: &brain_model_output_size 1024
trainer_config.monitor: val/batch_top5_acc
trainer_config.mode: max
loss:
name: ClipLoss
norm_kind: y
temperature: false
symmetric: false
metrics: !!python/name:neuralbench.defaults.metrics.retrieval_metrics
test_full_retrieval_metrics: !!python/name:neuralbench.defaults.metrics.test_full_retrieval_metrics
Description¶
The word decoding task involves decoding word stimuli from EEG recordings [dAscoli2025]. In this task, we use the Nieuwland2018 dataset [Nieuwland2018], which contains EEG data recorded across 8 UK laboratories while subjects read 80 sentences on a screen in a rapid serial visual presentation paradigm. Word embeddings are extracted using contextualized GPT-2 representations.
We exclude the GLAS (Glasgow, 128-channel BioSemi) and LOND (London, 34-channel) sites because their EEG montages are incompatible with the standard ~64-channel 10-20 systems used by the other 6 sites. Including them inflates the channel dimension to 194 (the union of all unique channel names) with heavy zero-padding, significantly slowing training without improving evaluation quality.
As in [dAscoli2025], the retrieval set is built from the 250 most frequent words in the test split.
References¶
d’Ascoli, Stéphane, et al. “Towards decoding individual words from non-invasive brain recordings.” Nature Communications 16.1 (2025): 10521.
Nieuwland, Mante S., et al. “Large-scale replication study reveals a limit on probabilistic prediction in language comprehension.” ELife 7 (2018): e33468.