spdl.io.transfer_tensor

transfer_tensor(batch: T, /) T[source]

Transfers PyTorch CPU Tensors to CUDA in a dedicated stream.

This function wraps calls to torch.Tensor.pin_memory() and torch.Tensor.to(), and execute them in a dedicated CUDA stream.

When called in a background thread, the data transfer overlaps with the GPU computation happening in the foreground thread (such as training and inference).

See also

Multi-threading (custom) - An intended way to use this function in Pipeline.

../_static/data/parallelism_transfer.png

Concretely, it performs the following operations.

  1. If a dedicated CUDA stream local to the calling thread is not found in a thread-local storage, creates and stashes one. (The target device is deetrmined by "LOCAL_RANK" environment variable.)

  2. Activates the CUDA stream.

  3. Traverses the given object recursively, and transfer tensors to GPU. Data are first copied to page-locked memory by calling pin_memory method, then the data is transferred to the GPU in asynchronous manner. (i.e. .to(non_blocking=True))

  4. Synchronizes the stream, to ensure that all the data transfers are completed.

Parameters:

batch – A Torch.Tensor or a composition of tensors with container types such as list, tuple, dict and dataclass.

Returns:

An object of the same type as the input, but the PyTorch tensors are transferred to CUDA device.