neuralset.dataloader.SegmentDataset¶

class neuralset.dataloader.SegmentDataset(extractors: Mapping[str, BaseExtractor], segments: Sequence[Segment], *, remove_incomplete_segments: bool = False, pad_duration: float | Literal['auto'] | None = None, transforms: dict[str, Callable] | None = None)[source][source]¶

Dataset defined through Segment instances and BaseExtractor instances.

Parameters:

extractors (dict of BaseExtractor) – extractors to be computed, returned in the Batch.data dictionary items
segments (list of Segment) – the list of segment instances defining the dataset
pad_duration (float | tp.Literal["auto"] | None) –

pad the segments to the maximum duration or to a specific duration
None: no padding. Will throw error if segment durations vary. “auto”: will pad with the max(segments.duration)
remove_incomplete_segments (bool) – remove segments which do not contain events for one of the extractors
transforms (dict, optional) – Map of extractor names to transforms (callables transforming the extractor tensor). If an extractor name is not present, no transform is applied. Keys must be a subset of the extractor names.
Usage
-----
code-block: (..) – python: extractors = {“whatever”: ns.extractors.Pulse()} ds = ns.SegmentDataset(extractors, segments) # one data item item = ds[0] assert item.data[“whatever”].shape[0] == 1 # batch dimension is always added # through dataloader: dataloader = torch.utils.data.DataLoader(ds, collate_fn=ds.collate_fn, batch_size=2) batch = next(iter(dataloader)) print(batch.data[“whatever”]) # batch.segments holds the corresponding segments

as_one_batch(num_workers: int = 0) → Batch[source][source]¶: Deprecated: use load_all() instead.

build_dataloader(**kwargs: Any) → DataLoader[source][source]¶: Returns a dataloader for this dataset

collate_fn(batches: list[Batch]) → Batch[source][source]¶: Creates a new instance from several by stacking in a new first dimension for all attributes

load_all(num_workers: int = 0) → Batch[source][source]¶: Returns a single batch with all the dataset data, un-shuffled