Position Encoders¶

A set of PyTorch modules to encode sequences with positional information.

ABCs

PositionEncoder

Classes

SinusoidalPositionEncoder
LearnedPositionEncoder
RotaryEncoder

ABCs¶

class fairseq2.nn.PositionEncoder(encoding_dim)[source]¶

Bases: Module, ABC

Encodes sequences with positional information.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

        classDiagram
  ABC <|-- PositionEncoder
  Module <|-- PositionEncoder
  PositionEncoder <|-- LearnedPositionEncoder
  PositionEncoder <|-- RotaryEncoder
  PositionEncoder <|-- SinusoidalPositionEncoder

abstract forward(seqs, seqs_layout, *, state_bag=None)[source]¶

Returns a copy of seqs with positional information encoded.

Parameters:

seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

Classes¶

final class fairseq2.nn.SinusoidalPositionEncoder(encoding_dim, max_seq_len, *, _legacy_pad_idx=None, device=None)[source]¶

Bases: PositionEncoder

Encodes sequences with fixed sinusoidal positional information.

Parameters:

encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

Raises:

ValueError – when encoding_dim is not even.

        classDiagram
  ABC <|-- PositionEncoder
  Module <|-- PositionEncoder
  PositionEncoder <|-- SinusoidalPositionEncoder

The positional encodings are initialized as in tensor2tensor which differs slightly from the description in section 3.5 of Vaswani et al. [VSP+23]. This means instead of:

\[ \begin{align}\begin{aligned}PE_{(pos, 2i)} = \text{sin}(pos/10000^{2i/d_{model}})\\PE_{(pos, 2i+1)} = \text{cos}(pos/10000^{2i/d_{model}})\end{aligned}\end{align} \]

we use:

\[ \begin{align}\begin{aligned}PE_{(pos, i)} = \text{sin}(pos/10000^{i/d_{model}})\;\text{for}\;i\; <\frac{d_{model}}{2}\\PE_{(pos, i)} = \text{cos}(pos/10000^{i/d_{model}})\;\text{for}\;i\;\geq\frac{d_{model}}{2}\end{aligned}\end{align} \]

See here for more information.

Usage:

>>> import torch
>>>
>>> from fairseq2.nn.position_encoder import SinusoidalPositionEncoder
>>>
>>> m = SinusoidalPositionEncoder(encoding_dim=4, max_seq_len=16)
>>>
>>> seqs = torch.ones((3, 4))
>>>
>>> m(seqs)
tensor([[ 1.0000e+00,  1.0000e+00,  2.0000e+00,  2.0000e+00],  # pos 0
        [ 9.4147e-01,  2.0000e-04,  6.4030e-01,  2.0000e+00],  # pos 1
        [ 1.0930e-02,  3.0000e-04, -5.1615e-01,  2.0000e+00]]) # pos 2

forward(seqs, seqs_layout, *, state_bag=None)[source]¶

Returns a copy of seqs with positional information encoded.

Parameters:

seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

final class fairseq2.nn.LearnedPositionEncoder(encoding_dim, max_seq_len, *, device=None, dtype=None)[source]¶

Bases: PositionEncoder

Encodes sequences with learned positional embeddings.

Parameters:

encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

        classDiagram
  ABC <|-- PositionEncoder
  Module <|-- PositionEncoder
  PositionEncoder <|-- LearnedPositionEncoder

Usage:

>>> import torch
>>>
>>> from fairseq2.nn.position_encoder import LearnedPositionEncoder
>>>
>>> m = LearnedPositionEncoder(encoding_dim=4, max_seq_len=16)
>>>
>>> seqs = torch.ones((3, 4))
>>>
>>> m(seqs)
tensor([[ 1.1135,  0.5548,  0.4293,  2.0112],                               # pos 0
        [ 0.2364,  0.6009,  3.3865, -2.4810],                               # pos 1
        [-0.4746,  0.4544,  0.2761,  0.8828]], grad_fn=<SqueezeBackward1>)  # pos 2

forward(seqs, seqs_layout, *, state_bag=None)[source]¶

Returns a copy of seqs with positional information encoded.

Parameters:

seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

final class fairseq2.nn.RotaryEncoder(encoding_dim, max_seq_len, *, theta=10000.0, freqs_init_fn=None, device=None)[source]¶

Bases: PositionEncoder

Encodes sequences with relative positional information as described in Su et al. [SLP+23].

Parameters:

encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.
theta (float) – The coefficient of the long-term decay as described in section 3.3 of the reference paper.
freqs_init_fn (Callable[[RotaryEncoder], Tensor] | None) – A callable to initialize the frequency table. The encoder will be passed to the callable as an argument and it is expected for the callable to return a Tensor holding the frequency table. If None, the frequencies will be initialized as described in the reference paper.

Raises:

ValueError – when encoding_dim is not even.

        classDiagram
  ABC <|-- PositionEncoder
  Module <|-- PositionEncoder
  PositionEncoder <|-- RotaryEncoder

forward(seqs, seqs_layout, *, state_bag=None)[source]¶

Returns a copy of seqs with positional information encoded.

Parameters:

seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor