Speech decoding

Name: speech
Category: cognitive decoding
Dataset: Brennan2019
Objective: Retrieval
Split: Leave-last-runs-out

Usage

neuralbench eeg speech
Show config.yaml
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

data:
  study:
    source:
      name: Brennan2019Hierarchical
    filter_event_types:
      name: QueryEvents
      query: "type in ['Eeg', 'Audio', 'Word']"
    split:
      name: PredefinedSplit
      event_type: Word
      test_split_query: "sequence_id >= 68"  # Last 4 runs out of 20
      col_name: split
      valid_split_by: sequence_id
      valid_split_ratio: 0.2
      valid_random_state: 33
  neuro:
    baseline:
      - 0.0
      - 0.2
  channel_positions:
    layout_or_montage_name: null
  target:
    =replace=: true
    name: TimeAggregatedExtractor
    time_aggregation: mean
    n_groups_concat: 4
    extractor:
      name: Wav2Vec
      model_name: facebook/wav2vec2-large-xlsr-53
      frequency: 120.0
      layers:
        - 0.875  # 21/24
        - 0.917  # 22/24
        - 0.958  # 23/24
        - 1.0  # 24/24
      cache_n_layers: null
      layer_aggregation: mean
      token_aggregation: mean
      aggregation: first
      infra:
        cluster: auto
        keep_in_ram: false
        slurm_partition: !!python/name:neuralbench.config_manager.SLURM_PARTITION
        folder: !!python/name:neuralbench.config_manager.CACHE_DIR
        cpus_per_task: !!python/name:neuralbench.config_manager.N_CPUS
  trigger_event_type: Word
  start: -0.5
  duration: 3.0
  summary_columns: [text, sentence]
brain_model_output_size: &brain_model_output_size 4096  # 1024 * 4
target_scaler:
  dim: 1
trainer_config.monitor: val/batch_top5_acc
trainer_config.mode: max
loss:
  name: ClipLoss
  norm_kind: y
  temperature: false
  symmetric: false
metrics: !!python/name:neuralbench.defaults.metrics.retrieval_metrics
test_full_retrieval_metrics: !!python/name:neuralbench.defaults.metrics.test_full_retrieval_metrics

Description

The speech decoding task involves decoding speech stimuli from EEG recordings [Defossez2023b]. In this task, we use the Brennan2019 dataset [Brennan2019], which contains EEG data recorded while listened to an audiobook for ~12 minutes.

Dataset Notes

  • In [Brennan2019], participants listened to 12 minutes of an audiobook. We use the last 20% of sentences played during the recordings for testing, and the remaining runs for training and validation.

References

[Defossez2023b]

Défossez, Alexandre, et al. “Decoding speech perception from non-invasive brain recordings.” Nature Machine Intelligence 5.10 (2023): 1097-1107.

[Brennan2019] (1,2)

Brennan, Jonathan R., and John T. Hale. “Hierarchical structure guides rapid linguistic predictions during naturalistic listening.” PloS one 14.1 (2019): e0207741.