spdl.pipeline.cache_iterator

cache_iterator(src: Iterable[T], num_caches: int, *, return_caches_after: int | None = None, stop_after: int | None = None, delete_src: bool = True) Iterator[T][source]

Caches values from the iterator and returns caches after the given iteration.

The function is intended for estimating the maximum performance gain achieved by optimizing the data loader.

You can wrap your data loader with this function, and run it in the training pipeline, and compare the performance to see if the training pipeline is bottlenecked with data loading.

Parameters:
  • src – Source iterator. Expected to be a data loader object.

  • num_caches – The number of items (batches) to cache.

  • return_caches_after – The number of iterations to use the original iterator. By default, it uses the same value as num_caches.

  • stop_after – If provided, the iteration stops after the given number of iteration is completed (including before and after cached values are returned). If not provided, the iterator keeps yielding the cached values forever.

  • delete_src – When this iterator starts returning the cached value, call del on the original data loader so that resources are released.

Returns:

The wrapper iterator.