Position Encoders

The diagram below shows the position encoder API as an example. The API is defined by the abstract PositionEncoder PyTorch module. SinusoidalPositionEncoder, LearnedPositionEncoder, and RotaryEncoder implement PositionEncoder for their respective algorithms. Technically, any of these position encoders can be used wherever a PositionEncoder is expected.

        classDiagram
    class Module {
        <<torch.nn.Module>>
        +parameters()
        +forward()*
        +train()
        +eval()
    }

    class PositionEncoder {
        <<abstract>>
        +encoding_dim: int
        +forward(seqs, seqs_layout, state_bag)*
    }

    class SinusoidalPositionEncoder {
        +max_seq_len: int
        +sin_offset: int
        +freqs: Tensor
        +forward(seqs, seqs_layout, state_bag)
        +reset_parameters()
    }

    class LearnedPositionEncoder {
        +max_seq_len: int
        +weight: Parameter
        +forward(seqs, seqs_layout, state_bag)
        +reset_parameters()
    }

    class RotaryEncoder {
        +max_seq_len: int
        +theta: float
        +freqs: Tensor
        +forward(seqs, seqs_layout, state_bag)
        +reset_parameters()
    }

    class ReferenceRotaryEncoder {
        +max_seq_len: int
        +theta: float
        +cos_freqs: Tensor
        +sin_freqs: Tensor
        +forward(seqs, seqs_layout, state_bag)
        +reset_parameters()
    }

    Module <|-- PositionEncoder
    PositionEncoder <|-- SinusoidalPositionEncoder
    PositionEncoder <|-- LearnedPositionEncoder
    PositionEncoder <|-- RotaryEncoder
    PositionEncoder <|-- ReferenceRotaryEncoder
    
class fairseq2.nn.PositionEncoder(encoding_dim)[source]

Bases: Module, ABC

Encodes sequences with positional information.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(seqs, seqs_layout, *, state_bag=None)[source]

Returns a copy of seqs with positional information encoded.

Parameters:
  • seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.

  • state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

final class fairseq2.nn.SinusoidalPositionEncoder(encoding_dim, max_seq_len, *, _legacy_pad_idx=None, device=None)[source]

Bases: PositionEncoder

Encodes sequences with fixed sinusoidal positional information.

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.

  • max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

Raises:

ValueError – when encoding_dim is not even.

reset_parameters()[source]
reset_non_persistent_buffers()[source]
forward(seqs, seqs_layout, *, state_bag=None)[source]

Returns a copy of seqs with positional information encoded.

Parameters:
  • seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.

  • state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

final class fairseq2.nn.LearnedPositionEncoder(encoding_dim, max_seq_len, *, device=None, dtype=None)[source]

Bases: PositionEncoder

Encodes sequences with learned positional embeddings.

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.

  • max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

reset_parameters()[source]
forward(seqs, seqs_layout, *, state_bag=None)[source]

Returns a copy of seqs with positional information encoded.

Parameters:
  • seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.

  • state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

final class fairseq2.nn.RotaryEncoder(encoding_dim, max_seq_len, *, theta=10000.0, freqs_init_fn=None, device=None)[source]

Bases: PositionEncoder

Encodes sequences with relative positional information as described in Su et al. [3].

Parameters:
  • encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.

  • max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than max_seq_len will cause a ValueError.

  • theta (float) – The coefficient of the long-term decay as described in section 3.3 of the reference paper.

  • freqs_init_fn (Callable[[RotaryEncoder], Tensor] | None) – A callable to initialize the frequency table. The encoder will be passed to the callable as an argument and it is expected for the callable to return a Tensor holding the frequency table. If None, the frequencies will be initialized as described in the reference paper.

Raises:

ValueError – when encoding_dim is not even.

reset_parameters()[source]
reset_non_persistent_buffers()[source]
forward(seqs, seqs_layout, *, state_bag=None)[source]

Returns a copy of seqs with positional information encoded.

Parameters:
  • seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.

  • state_bag (IncrementalStateBag | None) – If not None, the encoder will operate in incremental decoding mode. The first element in seqs will be considered to be at position state_bag.step_nr instead of 0.

Raises:

ValueError – when the sequence length of seqs exceeds max_seq_len.

Returns:

The input sequences with positional information encoded. Shape: Same as seqs.

Return type:

Tensor

Example Usage:

from fairseq2.nn import SinusoidalPositionEncoder, RotaryEncoder

# Sinusoidal position encoding
pos_encoder = SinusoidalPositionEncoder(
    encoding_dim=512,
    max_seq_len=2048
)

# Use with BatchLayout for proper position handling
seqs = torch.randn(4, 6, 512)  # (batch, seq, features)
batch_layout = BatchLayout.of(seqs, seq_lens=[4, 2, 3, 5])

pos_encodings = pos_encoder(seqs, seqs_layout=batch_layout)