Position Encoders¶
The diagram below shows the position encoder API
as an example. The API is defined by the abstract PositionEncoder
PyTorch module. SinusoidalPositionEncoder
, LearnedPositionEncoder
,
and RotaryEncoder
implement PositionEncoder
for their
respective algorithms. Technically, any of these position encoders can be used
wherever a PositionEncoder
is expected.
classDiagram class Module { <<torch.nn.Module>> +parameters() +forward()* +train() +eval() } class PositionEncoder { <<abstract>> +encoding_dim: int +forward(seqs, seqs_layout, state_bag)* } class SinusoidalPositionEncoder { +max_seq_len: int +sin_offset: int +freqs: Tensor +forward(seqs, seqs_layout, state_bag) +reset_parameters() } class LearnedPositionEncoder { +max_seq_len: int +weight: Parameter +forward(seqs, seqs_layout, state_bag) +reset_parameters() } class RotaryEncoder { +max_seq_len: int +theta: float +freqs: Tensor +forward(seqs, seqs_layout, state_bag) +reset_parameters() } class ReferenceRotaryEncoder { +max_seq_len: int +theta: float +cos_freqs: Tensor +sin_freqs: Tensor +forward(seqs, seqs_layout, state_bag) +reset_parameters() } Module <|-- PositionEncoder PositionEncoder <|-- SinusoidalPositionEncoder PositionEncoder <|-- LearnedPositionEncoder PositionEncoder <|-- RotaryEncoder PositionEncoder <|-- ReferenceRotaryEncoder
- class fairseq2.nn.PositionEncoder(encoding_dim)[source]¶
-
Encodes sequences with positional information.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- abstract forward(seqs, seqs_layout, *, state_bag=None)[source]¶
Returns a copy of
seqs
with positional information encoded.- Parameters:
seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not
None
, the encoder will operate in incremental decoding mode. The first element inseqs
will be considered to be at positionstate_bag.step_nr
instead of 0.
- Raises:
ValueError – when the sequence length of
seqs
exceedsmax_seq_len
.- Returns:
The input sequences with positional information encoded. Shape: Same as
seqs
.- Return type:
- final class fairseq2.nn.SinusoidalPositionEncoder(encoding_dim, max_seq_len, *, _legacy_pad_idx=None, device=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with fixed sinusoidal positional information.
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.
- Raises:
ValueError – when
encoding_dim
is not even.
- forward(seqs, seqs_layout, *, state_bag=None)[source]¶
Returns a copy of
seqs
with positional information encoded.- Parameters:
seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not
None
, the encoder will operate in incremental decoding mode. The first element inseqs
will be considered to be at positionstate_bag.step_nr
instead of 0.
- Raises:
ValueError – when the sequence length of
seqs
exceedsmax_seq_len
.- Returns:
The input sequences with positional information encoded. Shape: Same as
seqs
.- Return type:
- final class fairseq2.nn.LearnedPositionEncoder(encoding_dim, max_seq_len, *, device=None, dtype=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with learned positional embeddings.
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.
- forward(seqs, seqs_layout, *, state_bag=None)[source]¶
Returns a copy of
seqs
with positional information encoded.- Parameters:
seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not
None
, the encoder will operate in incremental decoding mode. The first element inseqs
will be considered to be at positionstate_bag.step_nr
instead of 0.
- Raises:
ValueError – when the sequence length of
seqs
exceedsmax_seq_len
.- Returns:
The input sequences with positional information encoded. Shape: Same as
seqs
.- Return type:
- final class fairseq2.nn.RotaryEncoder(encoding_dim, max_seq_len, *, theta=10000.0, freqs_init_fn=None, device=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with relative positional information as described in Su et al. [3].
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.theta (float) – The coefficient of the long-term decay as described in section 3.3 of the reference paper.
freqs_init_fn (Callable[[RotaryEncoder], Tensor] | None) – A callable to initialize the frequency table. The encoder will be passed to the callable as an argument and it is expected for the callable to return a
Tensor
holding the frequency table. IfNone
, the frequencies will be initialized as described in the reference paper.
- Raises:
ValueError – when
encoding_dim
is not even.
- forward(seqs, seqs_layout, *, state_bag=None)[source]¶
Returns a copy of
seqs
with positional information encoded.- Parameters:
seqs (Tensor) – The input sequences to encode. Shape: \(([N],S,*,E)\), where \(N\) is the batch size, \(S\) is the sequence length, \(*\) is any number of batch dimensions including none, and \(E\) is the dimensionality of the positional encodings.
state_bag (IncrementalStateBag | None) – If not
None
, the encoder will operate in incremental decoding mode. The first element inseqs
will be considered to be at positionstate_bag.step_nr
instead of 0.
- Raises:
ValueError – when the sequence length of
seqs
exceedsmax_seq_len
.- Returns:
The input sequences with positional information encoded. Shape: Same as
seqs
.- Return type:
Example Usage:
from fairseq2.nn import SinusoidalPositionEncoder, RotaryEncoder
# Sinusoidal position encoding
pos_encoder = SinusoidalPositionEncoder(
encoding_dim=512,
max_seq_len=2048
)
# Use with BatchLayout for proper position handling
seqs = torch.randn(4, 6, 512) # (batch, seq, features)
batch_layout = BatchLayout.of(seqs, seq_lens=[4, 2, 3, 5])
pos_encodings = pos_encoder(seqs, seqs_layout=batch_layout)