Shortcuts

Model factory

class xformers.factory.model_factory.xFormerConfig(stack_configs: Union[List[Dict[str, Any]], Dict[str, Dict[str, Any]]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]

Bases: object

The configuration structure to define a full Transformer. This can include a stack of encoder layers, and a stack of decoder layers.

It is optionally possible to share the embedding weights in between the encoder and decoder positional encoding, as proposed for instance by Using the Output Embedding to Improve Language Models, Press et al.

A full config example is for instance as follows:

xformer_config = [
    {
        "reversible": False,  # Turn on to test the effect of using reversible layers
        "block_type": "encoder",
        "num_layers": LAYERS,
        "dim_model": EMB,
        "residual_norm_style": "pre",
        "position_encoding_config": {
            "name": "vocab",
            "seq_len": CONTEXT,
            "vocab_size": VOCAB_SIZE,
        },
        "multi_head_config": {
            "num_heads": NUM_HEADS,
            "residual_dropout": RES_DROP,
            "use_rotary_embeddings": True,
            "attention": {
                "name": ATTENTION_MECHANISM_STR,
                "dropout": ATTN_DROP,
                "causal": True,
                "seq_len": CONTEXT,
            },
        },
        "feedforward_config": {
            "name": "FusedMLP",  # Use MLP if Triton is not available
            "dropout": MLP_DROP,
            "activation": "gelu",
            "hidden_layer_multiplier": MLP_MULTIPLIER,
        },
    }
]
stack_configs: Union[List[xFormerBlockConfig], Dict[str, xFormerBlockConfig]]
tie_embedding_weights: bool = False
weight_init: xFormerWeightInit = 'vit'
class xformers.factory.model_factory.xFormer(stack_configs: Union[xFormerBlockConfig, List[xFormerBlockConfig], Dict[str, xFormerBlockConfig]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]

Bases: Module

__init__(stack_configs: Union[xFormerBlockConfig, List[xFormerBlockConfig], Dict[str, xFormerBlockConfig]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]

Given a serialized configuration, generate the corresponding model. This is only a helper and can easily be bypassed

training: bool
classmethod from_config(config: xFormerConfig)[source]
init_weights(weight_init: xFormerWeightInit, use_deep_norm: bool)[source]
forward(src: Tensor, tgt: Optional[Tensor] = None, encoder_input_mask: Optional[Tensor] = None, decoder_input_mask: Optional[Tensor] = None) Optional[Tensor][source]