Attention mechanisms¶
- class xformers.components.attention.ScaledDotProduct(dropout: float = 0.0, causal: bool = False, seq_len: Optional[int] = None, to_seq_len: Optional[int] = None, *args, **kwargs)[source]¶
Bases:
Attention
Implementing the Scaled Dot-Product attention proposed in Attention is all you need, Vaswani et al.
- mask: Optional[AttentionMask]¶
- forward(q: Tensor, k: Tensor, v: Tensor, att_mask: Optional[Union[AttentionMask, Tensor]] = None, *args, **kwargs) Tensor [source]¶
att_mask A 2D or 3D mask which ignores attention at certain positions.
- If the mask is boolean, a value of True will keep the value,
while a value of False will mask the value.
Key padding masks (dimension: batch x sequence length) and attention masks (dimension: sequence length x sequence length OR batch x sequence length x sequence length) can be combined and passed in here. Method maybe_merge_masks provided in the utils can be used for that merging.
If the mask has the float type, then an additive mask is expected (masked values are -inf)
- class xformers.components.attention.Attention(dropout: Optional[float] = None, *args, **kwargs)[source]¶
Bases:
Module
The base Attention mechanism, which is typically a sub-part of the multi-head attention
- class xformers.components.attention.AttentionMask(additive_mask: Tensor, is_causal: bool = False)[source]¶
Bases:
object
Holds an attention mask, along with a couple of helpers and attributes.
- classmethod from_bool(x: Tensor) Self [source]¶
Create an AttentionMask given a boolean pattern. .. warning: we assume here that True implies that the value should be computed
- classmethod from_multiplicative(x: Tensor) Self [source]¶
Create an AttentionMask given a multiplicative attention mask.
- classmethod make_causal(seq_len: int, to_seq_len: Optional[int] = None, device: Optional[device] = None, dtype: Optional[dtype] = None) Self [source]¶
- make_crop(seq_len: int, to_seq_len: Optional[int] = None) AttentionMask [source]¶
Return a cropped attention mask, whose underlying tensor is a view of this one
- property device¶
- property is_sparse¶
- property ndim¶
- property dtype¶
- property shape¶
- to(device: Optional[device] = None, dtype: Optional[dtype] = None) AttentionMask [source]¶
- xformers.components.attention.build_attention(config: Union[Dict[str, Any], AttentionConfig])[source]¶
Builds an attention from a config.
This assumes a ‘name’ key in the config which is used to determine what attention class to instantiate. For instance, a config {“name”: “my_attention”, “foo”: “bar”} will find a class that was registered as “my_attention” (see
register_attention()
) and call .from_config on it.
- xformers.components.attention.register_attention(name: str, config: ~typing.Any = <class 'xformers.components.attention.base.AttentionConfig'>)¶
Registers a subclass.
This decorator allows xFormers to instantiate a given subclass from a configuration file, even if the class itself is not part of the xFormers library.