Shortcuts

Attention mechanisms

class xformers.components.attention.ScaledDotProduct(dropout: float = 0.0, causal: bool = False, seq_len: Optional[int] = None, to_seq_len: Optional[int] = None, *args, **kwargs)[source]

Bases: Attention

Implementing the Scaled Dot-Product attention proposed in Attention is all you need, Vaswani et al.

mask: Optional[AttentionMask]
forward(q: Tensor, k: Tensor, v: Tensor, att_mask: Optional[Union[AttentionMask, Tensor]] = None, *args, **kwargs) Tensor[source]

att_mask A 2D or 3D mask which ignores attention at certain positions.

  • If the mask is boolean, a value of True will keep the value,

    while a value of False will mask the value.

    Key padding masks (dimension: batch x sequence length) and attention masks (dimension: sequence length x sequence length OR batch x sequence length x sequence length) can be combined and passed in here. Method maybe_merge_masks provided in the utils can be used for that merging.

  • If the mask has the float type, then an additive mask is expected (masked values are -inf)

class xformers.components.attention.Attention(dropout: Optional[float] = None, *args, **kwargs)[source]

Bases: Module

The base Attention mechanism, which is typically a sub-part of the multi-head attention

classmethod from_config(config: AttentionConfig) Self[source]
abstract forward(q: Tensor, k: Tensor, v: Tensor, *args, **kwargs) Tensor[source]
class xformers.components.attention.AttentionMask(additive_mask: Tensor, is_causal: bool = False)[source]

Bases: object

Holds an attention mask, along with a couple of helpers and attributes.

to_bool() Tensor[source]
classmethod from_bool(x: Tensor) Self[source]

Create an AttentionMask given a boolean pattern. .. warning: we assume here that True implies that the value should be computed

classmethod from_multiplicative(x: Tensor) Self[source]

Create an AttentionMask given a multiplicative attention mask.

classmethod make_causal(seq_len: int, to_seq_len: Optional[int] = None, device: Optional[device] = None, dtype: Optional[dtype] = None) Self[source]
make_crop(seq_len: int, to_seq_len: Optional[int] = None) AttentionMask[source]

Return a cropped attention mask, whose underlying tensor is a view of this one

property device
property is_sparse
property ndim
property dtype
property shape
to(device: Optional[device] = None, dtype: Optional[dtype] = None) AttentionMask[source]
xformers.components.attention.build_attention(config: Union[Dict[str, Any], AttentionConfig])[source]

Builds an attention from a config.

This assumes a ‘name’ key in the config which is used to determine what attention class to instantiate. For instance, a config {“name”: “my_attention”, “foo”: “bar”} will find a class that was registered as “my_attention” (see register_attention()) and call .from_config on it.

xformers.components.attention.register_attention(name: str, config: ~typing.Any = <class 'xformers.components.attention.base.AttentionConfig'>)

Registers a subclass.

This decorator allows xFormers to instantiate a given subclass from a configuration file, even if the class itself is not part of the xFormers library.