Custom parts reference¶

Sparse CUDA kernels¶

1. Building the kernels¶

xFormers transparently supports CUDA kernels to implement sparse attention computations, some of which are based on Sputnik. These kernels require xFormers to be installed from source, and the recipient machine to be able to compile CUDA source code.

git clone git@github.com:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Common issues are related to:

NVCC and the current CUDA runtime match. You can often change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
the version of GCC that you’re using matches the current NVCC capabilities
the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST=”6.0;6.1;6.2;7.0;7.2;8.0;8.6”

2. Usage¶

The sparse attention computation is automatically triggered when using the scaled dot product attention (see), and a sparse enough mask (currently less than 30% of true values). There is nothing specific to do, and a couple of examples are provided in the tutorials.