fairseq2 uses a model system that makes it easy to add support for new models.
There are two main scenarios:
Adding a new model to an existing family (most common) - When you want to add a new size or variant of an existing model architecture
Creating an entirely new model family (advanced) - When you need to implement a completely new model architecture
This tutorial focuses on the first scenario, which covers 95% of use cases. For the second scenario, refer to the existing model family implementations as reference.
In fairseq2, a model family groups related model architectures that share the same underlying implementation but differ in size or configuration. For example:
Qwen family: qwen25_3b, qwen25_3b, qwen25_14b, qwen3_8b, etc.
LLaMA family: llama3_8b, llama3_70b, llama3_2_1b, etc.
Mistral family: mistral_7b, mistral_8x7b, etc.
Each family consists of:
Model configurations (architectures): Define structural parameters (layers, dimensions, etc.)
Asset cards: YAML files specifying download locations and metadata
Model implementation: The actual PyTorch model code and loading logic
Model hub: A unified interface providing methods to work with the family
Each model family provides a hub that exposes advanced functionality beyond simple model loading:
fromfairseq2.models.qwenimportget_qwen_model_hub# Get the model hub for Qwen familyhub=get_qwen_model_hub()# List available architecturesarchs=hub.get_archs()print(f"Available Qwen architectures: {archs}")# Get architecture configurationconfig=hub.get_arch_config("qwen3_0.6b")# Create a newly initialized model (random weights)new_model=hub.create_new_model(config)# Load model from asset cardmodel=hub.load_model("qwen3_0.6b")# Load model from custom checkpointfrompathlibimportPathcustom_model=hub.load_custom_model(Path("/path/to/checkpoint.pt"),config)
Model architectures are defined as configuration presets that specify the structural parameters of your model.
Navigate to the model family’s config file:
src/fairseq2/models/qwen/config.py
Add a new architecture function:
@arch("qwen25_3b_instruct")defqwen25_3b_instruct()->QwenConfig:"""Configuration for Qwen2.5-3B-Instruct model. """config=QwenConfig()# Set model dimensions and structureconfig.model_dim=2048...returnconfig
Key points:
The @arch decorator registers the configuration with the given name
The function name should match or describe the architecture
Use the appropriate config class for the model family (QwenConfig for Qwen models)
Set all necessary parameters for your specific model variant
After adding the configuration and asset card, verify that your model is properly registered:
Check if model is recognized:
# List all models to see if yours appears
python-mfairseq2.assetslist--kindmodel
# Look specifically for your model
python-mfairseq2.assetslist--kindmodel|grepqwen25_3b_instruct
Test model loading:
importfairseq2fromfairseq2.models.hubimportload_model# Test loading your modeltry:model=load_model("qwen25_3b_instruct")print(f"✓ Success! Loaded model with {sum(p.numel()forpinmodel.parameters())} parameters")exceptExceptionase:print(f"✗ Error: {e}")
Inspect model metadata:
# Show detailed model information
python-mfairseq2.assetsshowqwen25_3b_instruct
When creating new architecture configurations, here are the most common parameter naming conventions you’ll find in fairseq2 (it may vary depending on the model architecture):
config.model_dim=2048# Model dimensionalityconfig.num_layers=36# Number of transformer layersconfig.num_attn_heads=16# Number of attention headsconfig.num_key_value_heads=2# Key/value heads (for GQA/MQA)config.ffn_inner_dim=11_008# Feed-forward network inner dimension
@arch("qwen25_3b_instruct")defqwen25_3b_instruct()->QwenConfig:"""Qwen2.5-3B-Instruct: Language model with 3B parameters. Paper: https://arxiv.org/abs/2024.xxxxx """config=QwenConfig()config.model_dim=2048...returnconfig
# Check model is listed
python-mfairseq2.assetslist--kindmodel|grepqwen25_3b_instruct
# Show model details
python-mfairseq2.assetsshowqwen25_3b_instruct
# Quick load test
python-c"from fairseq2.models.hub import load_modelmodel = load_model('qwen25_3b_instruct')print('✓ Success!')"
This complete example shows all the steps needed to add a new model to fairseq2.
The process is straightforward but requires attention to detail to ensure all components work together correctly.