Normalization Layers¶
- class fairseq2.nn.LayerNorm(*args, **kwargs)[source]¶
-
Applies Layer Normalization to incoming data.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- final class fairseq2.nn.StandardLayerNorm(normalized_shape: int | Sequence[int] | Size, bias: bool, *, eps: float = 1e-05, elementwise_affine: bool = True, cast_fp32: bool = False, init_fn: Callable[[StandardLayerNorm], None] | None = None, device: device | None = None, dtype: dtype | None = None)[source]¶
Bases:
LayerNorm
Applies Layer Normalization to incoming data as described in Ba et al. [1].
- Parameters:
normalized_shape – The shape over which to normalize incoming data. For example, if the shape is
(3, 5)
, the incoming data is normalized over the last 2 dimensions (i.e.input.mean((-2, -1))
).bias – If
True
, learns an additive bias. Ignored ifelementwise_affine
isFalse
.eps – The value to add to the denominator for numerical stability.
elementwise_affine – If
True
, learns an affine transformation.
- final class fairseq2.nn.RMSNorm(normalized_shape: int | Sequence[int] | Size, bias: bool, *, eps: float = 1e-05, elementwise_affine: bool = True, init_fn: Callable[[RMSNorm], None] | None = None, device: device | None = None, dtype: dtype | None = None)[source]¶
Bases:
LayerNorm
Applies Root Mean Square Layer Normalization to incoming data as described in Zhang and Sennrich [6].
- Parameters:
normalized_shape – The shape over which to normalize incoming data. For example, if the shape is
(3, 5)
, the incoming data is normalized over the last 2 dimensions (i.e.input.mean((-2, -1))
).bias – If
True
, learns an additive bias. Ignored ifelementwise_affine
isFalse
.eps – The value to add to the denominator for numerical stability.
elementwise_affine – If
True
, learns an affine transformation.