neuraltrain.models.transformer.TransformerEncoder¶
- pydantic model neuraltrain.models.transformer.TransformerEncoder[source][source]¶
Transformer encoder/decoder built on top of
x_transformers.- Parameters:
heads (int) – Number of attention heads.
depth (int) – Number of Transformer layers.
cross_attend (bool) – Enable cross-attention (decoder mode).
causal (bool) – If True, build a causal
Decoderinstead of anEncoder.attn_flash (bool) – Use Flash Attention. Not compatible with ALiBi.
attn_dropout (float) – Dropout probability inside the attention layers.
ff_mult (int) – Feed-forward expansion factor (
ff_dim = dim * ff_mult).ff_dropout (float) – Dropout probability in the feed-forward layers.
use_scalenorm (bool) – Use ScaleNorm instead of LayerNorm.
use_rmsnorm (bool) – Use RMSNorm instead of LayerNorm.
rel_pos_bias (bool) – Use relative positional bias.
alibi_pos_bias (bool) – Use ALiBi positional bias.
rotary_pos_emb (bool) – Use rotary positional embeddings.
rotary_xpos (bool) – Use xPos extension for rotary embeddings.
residual_attn (bool) – Add residual connections around the attention output.
scale_residual (bool) – Scale residual connections.
layer_dropout (float) – Probability of dropping an entire Transformer layer during training.
- Fields: