spdl.io.transfer_tensor¶
- transfer_tensor(batch: T, /) T [source]¶
Transfers PyTorch CPU Tensors to CUDA in a dedicated stream.
This function wraps calls to
torch.Tensor.pin_memory()
andtorch.Tensor.to()
, and execute them in a dedicated CUDA stream.When called in a background thread, the data transfer overlaps with the GPU computation happening in the foreground thread (such as training and inference).
See also
Multi-threading (custom) - An intended way to use this function in
Pipeline
.Concretely, it performs the following operations.
If a dedicated CUDA stream local to the calling thread is not found in a thread-local storage, creates and stashes one. (The target device is deetrmined by
"LOCAL_RANK"
environment variable.)Activates the CUDA stream.
Traverses the given object recursively, and transfer tensors to GPU. Data are first copied to page-locked memory by calling
pin_memory
method, then the data is transferred to the GPU in asynchronous manner. (i.e..to(non_blocking=True)
)Synchronizes the stream, to ensure that all the data transfers are completed.
- Parameters:
batch – A
Torch.Tensor
or a composition of tensors with container types such aslist
,tuple
,dict
anddataclass
.- Returns:
An object of the same type as the input, but the PyTorch tensors are transferred to CUDA device.