Model factory¶
- class xformers.factory.model_factory.xFormerConfig(stack_configs: Union[List[Dict[str, Any]], Dict[str, Dict[str, Any]]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]¶
Bases:
object
The configuration structure to define a full Transformer. This can include a stack of encoder layers, and a stack of decoder layers.
It is optionally possible to share the embedding weights in between the encoder and decoder positional encoding, as proposed for instance by Using the Output Embedding to Improve Language Models, Press et al.
A full config example is for instance as follows:
xformer_config = [ { "reversible": False, # Turn on to test the effect of using reversible layers "block_type": "encoder", "num_layers": LAYERS, "dim_model": EMB, "residual_norm_style": "pre", "position_encoding_config": { "name": "vocab", "seq_len": CONTEXT, "vocab_size": VOCAB_SIZE, }, "multi_head_config": { "num_heads": NUM_HEADS, "residual_dropout": RES_DROP, "use_rotary_embeddings": True, "attention": { "name": ATTENTION_MECHANISM_STR, "dropout": ATTN_DROP, "causal": True, "seq_len": CONTEXT, }, }, "feedforward_config": { "name": "FusedMLP", # Use MLP if Triton is not available "dropout": MLP_DROP, "activation": "gelu", "hidden_layer_multiplier": MLP_MULTIPLIER, }, } ]
- weight_init: xFormerWeightInit = 'vit'¶
- class xformers.factory.model_factory.xFormer(stack_configs: Union[xFormerBlockConfig, List[xFormerBlockConfig], Dict[str, xFormerBlockConfig]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]¶
Bases:
Module
- __init__(stack_configs: Union[xFormerBlockConfig, List[xFormerBlockConfig], Dict[str, xFormerBlockConfig]], tie_embedding_weights: bool = False, weight_init: xFormerWeightInit = xFormerWeightInit.ViT)[source]¶
Given a serialized configuration, generate the corresponding model. This is only a helper and can easily be bypassed
- classmethod from_config(config: xFormerConfig)[source]¶