spdl.dataloader.PyTorchDataLoader

class PyTorchDataLoader[source]

A PyTorch-style data loader that works on map-style dataset. Use get_pytorch_dataloader() to instantiate this class. You can use this class as almost drop-in replacement of PyTorch’s DataLoader class.

The architecture of data loader is different in following ways:

  • Only the dataset and the collate function are copied to the worker process. (Sampler and Generator are not copied)

  • The dataset is copied to worker processed via shared memory.

  • Sampler is executed in the main process and the resulting indices are passed to the worker processes.

  • Worker processes share the same input/output queues. (PyTorch creates a set of i/o queues for each worker process.)

Due to the way Dataset is defined, this class still has to copy the dataset to each worker process. So the memory consumption is not reduced. However, fast initialization and reduced inter-process communication makes this implementation faster than PyTorch DataLoader.

Ivar:

dataset: The source dataset.

__iter__() Iterator[V][source]

Iterate on the dataset and yields samples/batches.

__len__() int[source]

Returns the number of samples/batches this data loader returns.