Design Philosophy

One of the core goals of fairseq2 is to make it possible for researchers to explore new ideas and implement novel features without having to fork fairseq2. Instead of having a monolithic repository that can only be modified by copy-pasting large chunks of code, in fairseq2, all major APIs follow the interface/implementation convention along with the dependency inversion principle. This means, each API has an interface (i.e. an abstract ABC class) that defines the contract of that API, and one or more concrete implementations of that interface. Different implementations can be integrated with the rest of fairseq2 via its lightweight dependency injection API.

Interface/Implementation Convention

The diagram below shows the position encoder API as an example. The API is defined by the abstract PositionEncoder PyTorch module. SinusoidalPositionEncoder, LearnedPositionEncoder, and RotaryEncoder implement PositionEncoder for their respective algorithms. Technically, any of these position encoders can be used wherever a PositionEncoder is expected (see Dependency Inversion below).

Position Encoder Hierarchy

When several implementations of an API share common logic, a typical pattern is to have an intermediate abstract class, prefixed with Abstract, between the interface and the concrete implementations. For example, the text tokenizer API has AbstractTextTokenizer that holds the common logic for SentencePieceTokenizer and TiktokenTokenizer.

Text Tokenizer Hierarchy

Dependency Inversion

The dependency inversion principle is critical to have a clean, well-tested, and extensible API. The example below shows the (abbreviated) __init__() method of the StandardTransformerDecoderLayer:

class StandardTransformerDecoderLayer(TransformerDecoderLayer):
    def __init__(
        self,
        self_attn: MultiheadAttention,
        encoder_decoder_attn: MultiheadAttention | None,
        ffn: FeedForwardNetwork
    ) -> None:
        ...

Instead of constructing the multihead attention and feed-forward network layers within its __init__() method, StandardTransformerDecoderLayer expects the caller to provide instances of MultiheadAttention and FeedForwardNetwork interfaces. This loose-coupling between an instance and its dependencies enables composing diverse object graphs, such as different model architectures, with minimal redundancy (i.e. code duplication).