Normalization Layers

class fairseq2.nn.LayerNorm(*args, **kwargs)[source]

Bases: Module, ABC

Applies Layer Normalization to incoming data.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(x: Tensor) Tensor[source]
Parameters:

x – The input to normalize. Shape: \((*,H)\), where \(H\) is normalized_shape.

Returns:

The normalized output. Shape: Same as x.

final class fairseq2.nn.StandardLayerNorm(normalized_shape: int | Sequence[int] | Size, bias: bool, *, eps: float = 1e-05, elementwise_affine: bool = True, cast_fp32: bool = False, init_fn: Callable[[StandardLayerNorm], None] | None = None, device: device | None = None, dtype: dtype | None = None)[source]

Bases: LayerNorm

Applies Layer Normalization to incoming data as described in Ba et al. [1].

Parameters:
  • normalized_shape – The shape over which to normalize incoming data. For example, if the shape is (3, 5), the incoming data is normalized over the last 2 dimensions (i.e. input.mean((-2, -1))).

  • bias – If True, learns an additive bias. Ignored if elementwise_affine is False.

  • eps – The value to add to the denominator for numerical stability.

  • elementwise_affine – If True, learns an affine transformation.

reset_parameters() None[source]
forward(x: Tensor) Tensor[source]
Parameters:

x – The input to normalize. Shape: \((*,H)\), where \(H\) is normalized_shape.

Returns:

The normalized output. Shape: Same as x.

extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

final class fairseq2.nn.RMSNorm(normalized_shape: int | Sequence[int] | Size, bias: bool, *, eps: float = 1e-05, elementwise_affine: bool = True, init_fn: Callable[[RMSNorm], None] | None = None, device: device | None = None, dtype: dtype | None = None)[source]

Bases: LayerNorm

Applies Root Mean Square Layer Normalization to incoming data as described in Zhang and Sennrich [6].

Parameters:
  • normalized_shape – The shape over which to normalize incoming data. For example, if the shape is (3, 5), the incoming data is normalized over the last 2 dimensions (i.e. input.mean((-2, -1))).

  • bias – If True, learns an additive bias. Ignored if elementwise_affine is False.

  • eps – The value to add to the denominator for numerical stability.

  • elementwise_affine – If True, learns an affine transformation.

reset_parameters() None[source]
forward(x: Tensor) Tensor[source]
Parameters:

x – The input to normalize. Shape: \((*,H)\), where \(H\) is normalized_shape.

Returns:

The normalized output. Shape: Same as x.

extra_repr() str[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.