Position Encoders¶
A set of PyTorch modules to encode sequences with positional information.
ABCs
Classes
ABCs¶
- class fairseq2.nn.PositionEncoder(encoding_dim, max_seq_len)[source]¶
-
Encodes sequences with positional information.
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
. Typically it is set to the context length of the underlying model. IfNone
, sequences can have arbitrary length.
classDiagram ABC <|-- PositionEncoder Module <|-- PositionEncoder PositionEncoder <|-- LearnedPositionEncoder PositionEncoder <|-- RotaryEncoder PositionEncoder <|-- SinusoidalPositionEncoder
- forward(seqs, padding_mask, *, state_bag=None)[source]¶
Returns a copy of
seqs
with positional information encoded.- Parameters:
seqs (Tensor) – The input sequences to encode. Shape: \((*,S,E)\), where \(*\) is any number of batch dimensions including none, \(S\) is the sequence length, and \(E\) is the dimensionality of the positional encodings.
padding_mask (PaddingMask | None) – The padding mask of
seqs
. Shape: \((*,S)\), where \(*\) is any number of batch dimensions including none and \(S\) is the sequence length.state_bag (IncrementalStateBag | None) – If not
None
, the encoder will operate in incremental decoding mode. This means that the first step inseqs
will be considered to be at positionstate_bag.step_nr
instead of 0.
- Raises:
ValueError – when the sequence length of
seqs
exceedsmax_seq_len
.- Returns:
The input sequences with positional information encoded. Shape: Same as
seqs
.- Return type:
Classes¶
- final class fairseq2.nn.SinusoidalPositionEncoder(encoding_dim, max_seq_len, *, _legacy_pad_idx=None, device=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with fixed sinusoidal positional information.
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.
- Raises:
ValueError – when
encoding_dim
is not even.
classDiagram ABC <|-- PositionEncoder Module <|-- PositionEncoder PositionEncoder <|-- SinusoidalPositionEncoder
The positional encodings are initialized as in tensor2tensor which differs slightly from the description in section 3.5 of Vaswani et al. [VSP+23]. This means instead of:
\[ \begin{align}\begin{aligned}PE_{(pos, 2i)} = \text{sin}(pos/10000^{2i/d_{model}})\\PE_{(pos, 2i+1)} = \text{cos}(pos/10000^{2i/d_{model}})\end{aligned}\end{align} \]we use:
\[ \begin{align}\begin{aligned}PE_{(pos, i)} = \text{sin}(pos/10000^{i/d_{model}})\;\text{for}\;i\; <\frac{d_{model}}{2}\\PE_{(pos, i)} = \text{cos}(pos/10000^{i/d_{model}})\;\text{for}\;i\;\geq\frac{d_{model}}{2}\end{aligned}\end{align} \]See here for more information.
Usage:
>>> import torch >>> >>> from fairseq2.nn.position_encoder import SinusoidalPositionEncoder >>> >>> m = SinusoidalPositionEncoder(encoding_dim=4, max_seq_len=16) >>> >>> seqs = torch.ones((3, 4)) >>> >>> m(seqs) tensor([[ 1.0000e+00, 1.0000e+00, 2.0000e+00, 2.0000e+00], # pos 0 [ 9.4147e-01, 2.0000e-04, 6.4030e-01, 2.0000e+00], # pos 1 [ 1.0930e-02, 3.0000e-04, -5.1615e-01, 2.0000e+00]]) # pos 2
- final class fairseq2.nn.LearnedPositionEncoder(encoding_dim, max_seq_len, *, device=None, dtype=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with learned positional embeddings.
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.
classDiagram ABC <|-- PositionEncoder Module <|-- PositionEncoder PositionEncoder <|-- LearnedPositionEncoder
Usage:
>>> import torch >>> >>> from fairseq2.nn.position_encoder import LearnedPositionEncoder >>> >>> m = LearnedPositionEncoder(encoding_dim=4, max_seq_len=16) >>> >>> seqs = torch.ones((3, 4)) >>> >>> m(seqs) tensor([[ 1.1135, 0.5548, 0.4293, 2.0112], # pos 0 [ 0.2364, 0.6009, 3.3865, -2.4810], # pos 1 [-0.4746, 0.4544, 0.2761, 0.8828]], grad_fn=<SqueezeBackward1>) # pos 2
- final class fairseq2.nn.RotaryEncoder(encoding_dim, max_seq_len, *, theta=10000.0, freqs_init_fn=None, device=None)[source]¶
Bases:
PositionEncoder
Encodes sequences with relative positional information as described in Su et al. [SLP+23].
- Parameters:
encoding_dim (int) – The dimensionality of positional encodings. The last dimension of input sequences is expected to have the same dimensionality.
max_seq_len (int | None) – The maximum allowed length for input sequences. Sequences longer than
max_seq_len
will cause aValueError
.theta (float) – The coefficient of the long-term decay as described in section 3.3 of the reference paper.
freqs_init_fn (Callable[[RotaryEncoder], Tensor] | None) – A callable to initialize the frequency table. The encoder will be passed to the callable as an argument and it is expected for the callable to return a
Tensor
holding the frequency table. IfNone
, the frequencies will be initialized as described in the reference paper.
- Raises:
ValueError – when
encoding_dim
is not even.
classDiagram ABC <|-- PositionEncoder Module <|-- PositionEncoder PositionEncoder <|-- RotaryEncoder