neuralset.events.transforms.chunking.ChunkEvents¶

pydantic model neuralset.events.transforms.chunking.ChunkEvents[source][source]¶

Chunk long events into shorter events.

Typical use: keep long recordings under a deep-learning model’s memory budget (e.g. Wav2Vec).

Parameters:

event_type_to_chunk (str) – Splittable event type to chunk. Any BaseSplittableEvent subclass (Audio, Video, Meg, Eeg, Fmri, …).
max_duration (float, default=``np.inf``) – Upper bound on chunk duration in seconds.
min_duration (float, default=0.0) – Lower bound on chunk duration. Behavior when impossible depends on tiling (see below).
tiling ({“max”, “equal”}, default "max") –
How each section is sub-divided:
- "max": emit chunks of exactly max_duration until the section is exhausted; the trailing partial chunk is dropped iff its duration is < min_duration.
- "equal": equal-sized chunks, each in [min_duration, max_duration]. Requires 2 * min_duration <= max_duration. Raises if a section is shorter than min_duration.
event_type_to_split_by (str, optional) – Align chunk boundaries with train/val/test labels carried by another event type’s split column, to avoid label leakage at split transitions. When set, chunk boundaries follow same-split runs and each run is sub-tiled per tiling.
allow_sample_leakage (bool, default=False) – Only relevant when event_type_to_split_by is set. If True, accept up to 1 sample of mislabeling at split transitions with sub-sample silence gaps (e.g. coarse-TR Fmri); otherwise raise.
Invariants
----------
min_duration. (- "max" may silently drop sections/trailing pieces shorter than)
set. (- Every emitted chunk is label-homogeneous when event_type_to_split_by is)
lossless (- "equal" is)
min_duration.

Raises:

ValueError –

tiling="equal" and a same-split run shorter than min_duration (cannot tile without losing labeled data — switch to tiling="max" to drop short pieces instead). - Two consecutive differently-labeled runs are less than one sample apart (cannot separate without label leakage).

Examples

Simple chunking (each x = one sample; sound sampled at 1 Hz):

input:
    max_duration: 4
    events:
        sound:   [x x x x x x x x x x x x x]     # 13 s
out (tiling="max"):   # tile with max duration + trail
    events:
        sound1:  [x x x x]
        sound2:          [x x x x]
        sound3:                  [x x x x]
        sound4:                          [x]     # short trailing chunk
out (tiling="equal"):  # tile with ~ same length
    events:
        sound1:  [x x x]
        sound2:        [x x x]
        sound3:              [x x x x]           # 3.25 s ideal, rounded to whole sample
        sound4:                      [x x x]

With train/test split labels:

input:
    max_duration: 4
    event_type_to_split_by: Word
    events:
        sound:   [x x x x x x x x x x x x x]     # 13 s
        word:     1 1 1 - - 2 2 2 2 2 2 2 2      # 1=test, 2=train, -=silence
out (tiling="equal"):                            # split-aligned, then sub-tiled
    events:
        sound1:  [x x x x]                       # test run
        sound2:          [x x x]                 # train run, 3 equal chunks
        sound3:                [x x x]
        sound4:                      [x x x]

Fields:

allow_sample_leakage (bool)
event_type_to_chunk (str)
event_type_to_split_by (str | None)
max_duration (float)
min_duration (float)
tiling (Literal['equal', 'max'])

field event_type_to_chunk: str [Required][source]¶

field event_type_to_split_by: str | None = None[source]¶

field min_duration: float = 0.0[source]¶

field max_duration: float = inf[source]¶

field tiling: Literal['equal', 'max'] = 'max'[source]¶

field allow_sample_leakage: bool = False[source]¶

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]¶

← Back to API reference