neuraltrain.models.transformer.TransformerEncoder¶
- class neuraltrain.models.transformer.TransformerEncoder(*, heads: int = 8, depth: int = 12, cross_attend: bool = False, causal: bool = False, attn_flash: bool = False, attn_dropout: float = 0.1, ff_mult: int = 4, ff_dropout: float = 0.0, use_scalenorm: bool = True, use_rmsnorm: bool = False, rel_pos_bias: bool = False, alibi_pos_bias: bool = False, rotary_pos_emb: bool = True, rotary_xpos: bool = False, residual_attn: bool = False, scale_residual: bool = True, layer_dropout: float = 0.0)[source][source]¶
Transformer encoder/decoder built on top of
x_transformers.- Parameters:
heads (int) – Number of attention heads.
depth (int) – Number of Transformer layers.
cross_attend (bool) – Enable cross-attention (decoder mode).
causal (bool) – If True, build a causal
Decoderinstead of anEncoder.attn_flash (bool) – Use Flash Attention. Not compatible with ALiBi.
attn_dropout (float) – Dropout probability inside the attention layers.
ff_mult (int) – Feed-forward expansion factor (
ff_dim = dim * ff_mult).ff_dropout (float) – Dropout probability in the feed-forward layers.
use_scalenorm (bool) – Use ScaleNorm instead of LayerNorm.
use_rmsnorm (bool) – Use RMSNorm instead of LayerNorm.
rel_pos_bias (bool) – Use relative positional bias.
alibi_pos_bias (bool) – Use ALiBi positional bias.
rotary_pos_emb (bool) – Use rotary positional embeddings.
rotary_xpos (bool) – Use xPos extension for rotary embeddings.
residual_attn (bool) – Add residual connections around the attention output.
scale_residual (bool) – Scale residual connections.
layer_dropout (float) – Probability of dropping an entire Transformer layer during training.