neuralset.events.transforms.chunking.ChunkEvents¶
- pydantic model neuralset.events.transforms.chunking.ChunkEvents[source][source]¶
Chunk long events into shorter events.
Typical use: keep long recordings under a deep-learning model’s memory budget (e.g. Wav2Vec).
- Parameters:
event_type_to_chunk (str) – Splittable event type to chunk. Any
BaseSplittableEventsubclass (Audio, Video, Meg, Eeg, Fmri, …).max_duration (float, default=``np.inf``) – Upper bound on chunk duration in seconds.
min_duration (float, default=0.0) – Lower bound on chunk duration. Behavior when impossible depends on
tiling(see below).tiling ({“max”, “equal”}, default
"max") –How each section is sub-divided:
"max": emit chunks of exactlymax_durationuntil the section is exhausted; the trailing partial chunk is dropped iff its duration is< min_duration."equal": equal-sized chunks, each in[min_duration, max_duration]. Requires2 * min_duration <= max_duration. Raises if a section is shorter thanmin_duration.
event_type_to_split_by (str, optional) – Align chunk boundaries with train/val/test labels carried by another event type’s
splitcolumn, to avoid label leakage at split transitions. When set, chunk boundaries follow same-splitruns and each run is sub-tiled pertiling.allow_sample_leakage (bool, default=False) – Only relevant when
event_type_to_split_byis set. If True, accept up to 1 sample of mislabeling at split transitions with sub-sample silence gaps (e.g. coarse-TR Fmri); otherwise raise.Invariants
----------
min_duration. (- "max" may silently drop sections/trailing pieces shorter than)
set. (- Every emitted chunk is label-homogeneous when event_type_to_split_by is)
lossless (- "equal" is)
min_duration.
- Raises:
tiling="equal"and a same-splitrun shorter thanmin_duration(cannot tile without losing labeled data — switch totiling="max"to drop short pieces instead). - Two consecutive differently-labeled runs are less than one sample apart (cannot separate without label leakage).
Examples
Simple chunking (each
x= one sample; sound sampled at 1 Hz):input: max_duration: 4 events: sound: [x x x x x x x x x x x x x] # 13 s out (tiling="max"): # tile with max duration + trail events: sound1: [x x x x] sound2: [x x x x] sound3: [x x x x] sound4: [x] # short trailing chunk out (tiling="equal"): # tile with ~ same length events: sound1: [x x x] sound2: [x x x] sound3: [x x x x] # 3.25 s ideal, rounded to whole sample sound4: [x x x]
With train/test split labels:
input: max_duration: 4 event_type_to_split_by: Word events: sound: [x x x x x x x x x x x x x] # 13 s word: 1 1 1 - - 2 2 2 2 2 2 2 2 # 1=test, 2=train, -=silence out (tiling="equal"): # split-aligned, then sub-tiled events: sound1: [x x x x] # test run sound2: [x x x] # train run, 3 equal chunks sound3: [x x x] sound4: [x x x]
- Fields: