Position Encoders

A set of PyTorch modules to encode sequences with positional information.

ABCs

Classes

ABCs

class fairseq2.nn.PositionEncoder(encoding_dim, max_seq_len)[source]

Bases: Module, ABC

Encodes sequences with positional information.

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. Input sequences are expected to have the same dimensionality.

  • max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError. Typically it is set to the context length of the underlying model. If None, sequences can have arbitrary length.

forward(seqs, padding_mask, *, state_bag=None)[source]

Returns a copy of seqs with positional information encoded.

Parameters:
  • seqs (Tensor) – The input sequences to encode. Shape: \((*,S,E)\), where \(*\) is any number of batch dimensions including none, \(S\) is the sequence length, and \(E\) is the dimensionality of the positional encodings.

  • padding_mask (PaddingMask | None) – The padding mask of seqs. Shape: \((*,S)\), where \(*\) is any number of batch dimensions including none and \(S\) is the sequence length.

  • state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. This means that the first step in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

abstract _do_forward(seqs, padding_mask, state_bag)[source]

When overriden in a subclass, returns a copy of seqs with positional information encoded. See forward() for parameter descriptions.

Return type:

Tensor

Classes

final class fairseq2.nn.SinusoidalPositionEncoder(encoding_dim, max_seq_len, *, _legacy_pad_idx=None, device=None)[source]

Bases: PositionEncoder

Encodes sequences with fixed sinusoidal positional information.

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. Input sequences are expected to have the same dimensionality.

  • max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

Raises:

ValueError – when encoding_dim is not even.

The positional encodings are initialized as in tensor2tensor which differs slightly from the description in section 3.5 of Vaswani et al. [VSP+17]. This means instead of:

\[ \begin{align}\begin{aligned}PE_{(pos, 2i)} = \text{sin}(pos/10000^{2i/d_{model}})\\PE_{(pos, 2i+1)} = \text{cos}(pos/10000^{2i/d_{model}})\end{aligned}\end{align} \]

we use:

\[ \begin{align}\begin{aligned}PE_{(pos, i)} = \text{sin}(pos/10000^{i/d_{model}})\;\text{for}\;i\; <\frac{d_{model}}{2}\\PE_{(pos, i)} = \text{cos}(pos/10000^{i/d_{model}})\;\text{for}\;i\;\geq\frac{d_{model}}{2}\end{aligned}\end{align} \]

See here for more information.

Usage:

>>> import torch
>>>
>>> from fairseq2.nn.position_encoder import SinusoidalPositionEncoder
>>>
>>> m = SinusoidalPositionEncoder(encoding_dim=4, max_seq_len=16)
>>>
>>> seqs = torch.ones((3, 4))
>>>
>>> m(seqs)
tensor([[ 1.0000e+00,  1.0000e+00,  2.0000e+00,  2.0000e+00],  # pos 0
        [ 9.4147e-01,  2.0000e-04,  6.4030e-01,  2.0000e+00],  # pos 1
        [ 1.0930e-02,  3.0000e-04, -5.1615e-01,  2.0000e+00]]) # pos 2
reset_parameters()[source]

Reset the parameters and buffers of the module.

reset_non_persistent_buffers()[source]

Reset the non-persistent buffers of the module.

final class fairseq2.nn.LearnedPositionEncoder(encoding_dim, max_seq_len, *, device=None, dtype=None)[source]

Bases: PositionEncoder

Encodes sequences with learned positional embeddings.

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. Input sequences are expected to have the same dimensionality.

  • max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

Usage:

>>> import torch
>>>
>>> from fairseq2.nn.position_encoder import LearnedPositionEncoder
>>>
>>> m = LearnedPositionEncoder(encoding_dim=4, max_seq_len=16)
>>>
>>> seqs = torch.ones((3, 4))
>>>
>>> m(seqs)
tensor([[ 1.1135,  0.5548,  0.4293,  2.0112],                               # pos 0
        [ 0.2364,  0.6009,  3.3865, -2.4810],                               # pos 1
        [-0.4746,  0.4544,  0.2761,  0.8828]], grad_fn=<SqueezeBackward1>)  # pos 2
reset_parameters()[source]

Reset the parameters and buffers of the module.

final class fairseq2.nn.RotaryEncoder(encoding_dim, max_seq_len, *, theta=10000.0, freqs_init_fn=None, device=None)[source]

Bases: PositionEncoder

Encodes sequences with relative positional information as described in Su et al. [SLP+21].

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. Input sequences are expected to have the same dimensionality.

  • max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

  • theta (float) – The coefficient of the long-term decay as described in section 3.3 of the reference paper.

  • freqs_init_fn (Callable[[RotaryEncoder], Tensor] | None) – A callable to initialize the frequency table. The encoder will be passed to the callable as an argument and it is expected for the callable to return a Tensor holding the frequency table. If None, the frequencies will be initialized as described in the reference paper.

Raises:

ValueError – when encoding_dim is not even.

reset_parameters()[source]

Reset the parameters and buffers of the module.

reset_non_persistent_buffers()[source]

Reset the non-persistent buffers of the module.