Add Your Own Model¶

Overview¶

fairseq2 uses a model system that makes it easy to add support for new models. There are two main scenarios:

Adding a new model to an existing family (most common) - When you want to add a new size or variant of an existing model architecture
Creating an entirely new model family (advanced) - When you need to implement a completely new model architecture

This tutorial focuses on the first scenario, which covers 95% of use cases. For the second scenario, refer to the existing model family implementations as reference.

Understanding Model Families¶

In fairseq2, a model family groups related model architectures that share the same underlying implementation but differ in size or configuration. For example:

Qwen family: qwen25_3b, qwen25_3b, qwen25_14b, qwen3_8b, etc.
LLaMA family: llama3_8b, llama3_70b, llama3_2_1b, etc.
Mistral family: mistral_7b, mistral_8x7b, etc.

Each family consists of:

Model configurations (architectures): Define structural parameters (layers, dimensions, etc.)
Asset cards: YAML files specifying download locations and metadata
Model implementation: The actual PyTorch model code and loading logic
Model hub: A unified interface providing methods to work with the family

Working with Model Hubs¶

Each model family provides a hub that exposes advanced functionality beyond simple model loading:

from fairseq2.models.qwen import get_qwen_model_hub

# Get the model hub for Qwen family
hub = get_qwen_model_hub()

# List available architectures
archs = hub.get_archs()
print(f"Available Qwen architectures: {archs}")

# Get architecture configuration
config = hub.get_arch_config("qwen3_0.6b")

# Create a newly initialized model (random weights)
new_model = hub.create_new_model(config)

# Load model from asset card
model = hub.load_model("qwen3_0.6b")

# Load model from custom checkpoint
from pathlib import Path
custom_model = hub.load_custom_model(Path("/path/to/checkpoint.pt"), config)

For detailed information on all hub capabilities, see fairseq2.models.hub.

Step-by-Step Guide: Adding a Model to Existing Family¶

Let’s walk through adding qwen25_3b_instruct to the existing Qwen family.

Step 1: Add Model Architecture Configuration¶

Model architectures are defined as configuration presets that specify the structural parameters of your model.

Navigate to the model family’s config file:
```
src/fairseq2/models/qwen/config.py
```
Add a new architecture function:
```
@arch("qwen25_3b_instruct")
def qwen25_3b_instruct() -> QwenConfig:
    """Configuration for Qwen2.5-3B-Instruct model.
    """
    config = QwenConfig()

    # Set model dimensions and structure
    config.model_dim = 2048
    ...

    return config
```
Key points:
- The @arch decorator registers the configuration with the given name
- The function name should match or describe the architecture
- Use the appropriate config class for the model family (QwenConfig for Qwen models)
- Set all necessary parameters for your specific model variant

Step 2: Create Asset Card¶

Asset cards are YAML files that tell fairseq2 where to find your model checkpoints and how to load them.

Navigate to the model family’s asset card file:
src/fairseq2/assets/cards/models/qwen.yaml

Add a new asset card entry:

name: qwen25_3b_instruct
model_family: qwen
model_arch: qwen25_3b
checkpoint: "hg://qwen/qwen2.5-3b-instruct"
tokenizer: "hg://qwen/qwen2.5-3b-instruct"
tokenizer_family: qwen
tokenizer_config:
    use_im_end: true

name: The model name users will use (e.g., load_model("qwen25_3b_instruct"))
model_family: Which model family handles this model (qwen)
model_arch: Which architecture configuration to use (qwen25_3b)
checkpoint: Where to download the model weights from
tokenizer: Where to download the tokenizer from
tokenizer_family: Which tokenizer family to use
tokenizer_config: Tokenizer-specific settings

For more details on asset card options, see Assets.

Step 3: Verify the Integration¶

After adding the configuration and asset card, verify that your model is properly registered:

Check if model is recognized:

# List all models to see if yours appears
python -m fairseq2.assets list --kind model

# Look specifically for your model
python -m fairseq2.assets list --kind model | grep qwen25_3b_instruct

Test model loading:

import fairseq2
from fairseq2.models.hub import load_model

# Test loading your model
try:
    model = load_model("qwen25_3b_instruct")
    print(f"✓ Success! Loaded model with {sum(p.numel() for p in model.parameters())} parameters")
except Exception as e:
    print(f"✗ Error: {e}")

Inspect model metadata:

# Show detailed model information
python -m fairseq2.assets show qwen25_3b_instruct

Asset Source Options¶

fairseq2 supports multiple sources for model checkpoints and tokenizers:

Hugging Face Hub (Recommended)¶

Most common and convenient option:

checkpoint: "hg://qwen/qwen2.5-3b-instruct"
tokenizer: "hg://qwen/qwen2.5-3b-instruct"

Note that only safetensors are supported for checkpoints.

Local Files¶

For development or custom models:

checkpoint: "file:///path/to/my/model.pt"
tokenizer: "file:///path/to/my/tokenizer"

HTTP URLs¶

Direct download links:

checkpoint: "https://example.com/models/my_model.pt"

Common Model Parameters¶

When creating new architecture configurations, here are the most common parameter naming conventions you’ll find in fairseq2 (it may vary depending on the model architecture):

Core Architecture¶

config.model_dim = 2048           # Model dimensionality
config.num_layers = 36            # Number of transformer layers
config.num_attn_heads = 16        # Number of attention heads
config.num_key_value_heads = 2    # Key/value heads (for GQA/MQA)
config.ffn_inner_dim = 11_008     # Feed-forward network inner dimension

Vocabulary & Sequence¶

config.vocab_size = 151_936       # Vocabulary size
config.max_seq_len = 32_768       # Maximum sequence length
config.tied_embeddings = True     # Tie input/output embeddings

Training & Architecture Details¶

config.head_dim = 128             # Attention head dimension (optional)
config.qkv_proj_bias = False      # Query/key/value projection bias
config.dropout_p = 0.0            # Dropout probability
config.rope_theta = 1_000_000.0   # RoPE theta parameter

Troubleshooting¶

Model Not Found Error¶

If you get ModelNotKnownError:

Check asset card syntax: Ensure your YAML is valid
Verify names match: Asset card name should match what you’re requesting
Check architecture registration: Ensure the @arch decorated function exists
Restart Python: Changes to config files require restarting your Python session

Architecture Configuration Error¶

If you get architecture-related errors:

Verify decorator: Ensure @arch("name") is properly applied
Check architecture name: Asset card model_arch must match the registered name
Validate parameters: Ensure all required config parameters are set

Download/Loading Errors¶

If model download or loading fails:

Check URLs: Verify checkpoint and tokenizer URLs are accessible
Test connectivity: Ensure you have internet access and proper authentication
Check file paths: For local files, verify paths exist and are readable
Validate checkpoint format: Ensure checkpoint is compatible with the model family

Configuration Validation Errors¶

If you get validation errors:

Check parameter types: Ensure integers are integers, strings are strings, etc.
Validate ranges: Some parameters may have valid ranges (e.g., positive integers)
Review dependencies: Some parameters may depend on others (e.g., head dimensions)

Example: Complete Implementation¶

Here’s a complete example showing all the files you need to modify to add qwen25_3b_instruct:

1. Architecture Configuration (src/fairseq2/models/qwen/config.py):

@arch("qwen25_3b_instruct")
def qwen25_3b_instruct() -> QwenConfig:
    """Qwen2.5-3B-Instruct: Language model with 3B parameters.

    Paper: https://arxiv.org/abs/2024.xxxxx
    """
    config = QwenConfig()

    config.model_dim = 2048
    ...

    return config

2. Asset Card (src/fairseq2/assets/cards/models/qwen.yaml):

---

name: qwen25_3b_instruct
model_family: qwen
model_arch: qwen25_3b
checkpoint: "hg://qwen/qwen2.5-3b-instruct"
tokenizer: "hg://qwen/qwen2.5-3b-instruct"
tokenizer_family: qwen
tokenizer_config:
  use_im_end: true

3. Command Line Verification:

# Check model is listed
python -m fairseq2.assets list --kind model | grep qwen25_3b_instruct

# Show model details
python -m fairseq2.assets show qwen25_3b_instruct

# Quick load test
python -c "
from fairseq2.models.hub import load_model
model = load_model('qwen25_3b_instruct')
print('✓ Success!')
"

This complete example shows all the steps needed to add a new model to fairseq2. The process is straightforward but requires attention to detail to ensure all components work together correctly.