neuralset.events.transforms.chunking.ChunkEvents

pydantic model neuralset.events.transforms.chunking.ChunkEvents[source][source]

Chunk long events into shorter events.

Typical use: keep long recordings under a deep-learning model’s memory budget (e.g. Wav2Vec).

Parameters:
  • event_type_to_chunk (str) – Splittable event type to chunk. Any BaseSplittableEvent subclass (Audio, Video, Meg, Eeg, Fmri, …).

  • max_duration (float, default=``np.inf``) – Upper bound on chunk duration in seconds.

  • min_duration (float, default=0.0) – Lower bound on chunk duration. Behavior when impossible depends on tiling (see below).

  • tiling ({“max”, “equal”}, default "max") –

    How each section is sub-divided:

    • "max": emit chunks of exactly max_duration until the section is exhausted; the trailing partial chunk is dropped iff its duration is < min_duration.

    • "equal": equal-sized chunks, each in [min_duration, max_duration]. Requires 2 * min_duration <= max_duration. Raises if a section is shorter than min_duration.

  • event_type_to_split_by (str, optional) – Align chunk boundaries with train/val/test labels carried by another event type’s split column, to avoid label leakage at split transitions. When set, chunk boundaries follow same-split runs and each run is sub-tiled per tiling.

  • allow_sample_leakage (bool, default=False) – Only relevant when event_type_to_split_by is set. If True, accept up to 1 sample of mislabeling at split transitions with sub-sample silence gaps (e.g. coarse-TR Fmri); otherwise raise.

  • Invariants

  • ----------

  • min_duration. (- "max" may silently drop sections/trailing pieces shorter than)

  • set. (- Every emitted chunk is label-homogeneous when event_type_to_split_by is)

  • lossless (- "equal" is)

  • min_duration.

Raises:

ValueError

  • tiling="equal" and a same-split run shorter than min_duration (cannot tile without losing labeled data — switch to tiling="max" to drop short pieces instead). - Two consecutive differently-labeled runs are less than one sample apart (cannot separate without label leakage).

Examples

Simple chunking (each x = one sample; sound sampled at 1 Hz):

input:
    max_duration: 4
    events:
        sound:   [x x x x x x x x x x x x x]     # 13 s
out (tiling="max"):   # tile with max duration + trail
    events:
        sound1:  [x x x x]
        sound2:          [x x x x]
        sound3:                  [x x x x]
        sound4:                          [x]     # short trailing chunk
out (tiling="equal"):  # tile with ~ same length
    events:
        sound1:  [x x x]
        sound2:        [x x x]
        sound3:              [x x x x]           # 3.25 s ideal, rounded to whole sample
        sound4:                      [x x x]

With train/test split labels:

input:
    max_duration: 4
    event_type_to_split_by: Word
    events:
        sound:   [x x x x x x x x x x x x x]     # 13 s
        word:     1 1 1 - - 2 2 2 2 2 2 2 2      # 1=test, 2=train, -=silence
out (tiling="equal"):                            # split-aligned, then sub-tiled
    events:
        sound1:  [x x x x]                       # test run
        sound2:          [x x x]                 # train run, 3 equal chunks
        sound3:                [x x x]
        sound4:                      [x x x]
Fields:
field event_type_to_chunk: str [Required][source]
field event_type_to_split_by: str | None = None[source]
field min_duration: float = 0.0[source]
field max_duration: float = inf[source]
field tiling: Literal['equal', 'max'] = 'max'[source]
field allow_sample_leakage: bool = False[source]
requirements: tp.ClassVar[tuple[str, ...]] = ()[source]