.. _tutorial-add-model:

:octicon:`ruby` Add Your Own Model
==================================

.. dropdown:: What you will learn
    :icon: multi-select
    :animate: fade-in

    * How to add a new model to an existing model family
    * How to register model configurations (architectures)
    * How to create asset cards for your models
    * How to verify your model integration works correctly

.. dropdown:: Prerequisites
    :icon: multi-select
    :animate: fade-in

    * Get familiar with fairseq2 basics (:ref:`basics-design-philosophy`)
    * Understanding of fairseq2 assets system (:ref:`basics-assets`)
    * Ensure you have fairseq2 installed

Overview
--------

fairseq2 uses a model system that makes it easy to add support for new models.
There are two main scenarios:

1. **Adding a new model to an existing family** (most common) - When you want to add a new size or variant of an existing model architecture
2. **Creating an entirely new model family** (advanced) - When you need to implement a completely new model architecture

This tutorial focuses on the first scenario, which covers 95% of use cases. For the second scenario, refer to the existing model family implementations as reference.

Understanding Model Families
----------------------------

In fairseq2, a **model family** groups related model architectures that share the same underlying implementation but differ in size or configuration. For example:

- **Qwen family**: ``qwen25_3b``, ``qwen25_3b``, ``qwen25_14b``, ``qwen3_8b``, etc.
- **LLaMA family**: ``llama3_8b``, ``llama3_70b``, ``llama3_2_1b``, etc.
- **Mistral family**: ``mistral_7b``, ``mistral_8x7b``, etc.

Each family consists of:

- **Model configurations** (architectures): Define structural parameters (layers, dimensions, etc.)
- **Asset cards**: YAML files specifying download locations and metadata
- **Model implementation**: The actual PyTorch model code and loading logic
- **Model hub**: A unified interface providing methods to work with the family

Working with Model Hubs
^^^^^^^^^^^^^^^^^^^^^^^^

Each model family provides a hub that exposes advanced functionality beyond simple model loading:

.. code-block:: python

    from fairseq2.models.qwen import get_qwen_model_hub

    # Get the model hub for Qwen family
    hub = get_qwen_model_hub()

    # List available architectures
    archs = hub.get_archs()
    print(f"Available Qwen architectures: {archs}")

    # Get architecture configuration
    config = hub.get_arch_config("qwen3_0.6b")

    # Create a newly initialized model (random weights)
    new_model = hub.create_new_model(config)

    # Load model from asset card
    model = hub.load_model("qwen3_0.6b")

    # Load model from custom checkpoint
    from pathlib import Path
    custom_model = hub.load_custom_model(Path("/path/to/checkpoint.pt"), config)

For detailed information on all hub capabilities, see :doc:`/reference/api/models/hub`.

Step-by-Step Guide: Adding a Model to Existing Family
-----------------------------------------------------

Let's walk through adding ``qwen25_3b_instruct`` to the existing Qwen family.

Step 1: Add Model Architecture Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Model architectures are defined as configuration presets that specify the structural parameters of your model.

1. **Navigate to the model family's config file:**

   .. code-block:: bash

       src/fairseq2/models/qwen/config.py

2. **Add a new architecture function:**

   .. code-block:: python

       @arch("qwen25_3b_instruct")
       def qwen25_3b_instruct() -> QwenConfig:
           """Configuration for Qwen2.5-3B-Instruct model.
           """
           config = QwenConfig()

           # Set model dimensions and structure
           config.model_dim = 2048
           ...

           return config

   **Key points:**

   - The ``@arch`` decorator registers the configuration with the given name
   - The function name should match or describe the architecture
   - Use the appropriate config class for the model family (``QwenConfig`` for Qwen models)
   - Set all necessary parameters for your specific model variant

Step 2: Create Asset Card
^^^^^^^^^^^^^^^^^^^^^^^^^

Asset cards are YAML files that tell fairseq2 where to find your model checkpoints and how to load them.

1. **Navigate to the model family's asset card file:**

    .. code-block:: bash

        src/fairseq2/assets/cards/models/qwen.yaml

2. **Add a new asset card entry:**

    .. code-block:: yaml

        name: qwen25_3b_instruct
        model_family: qwen
        model_arch: qwen25_3b
        checkpoint: "hg://qwen/qwen2.5-3b-instruct"
        tokenizer: "hg://qwen/qwen2.5-3b-instruct"
        tokenizer_family: qwen
        tokenizer_config:
            use_im_end: true

- ``name``: The model name users will use (e.g., ``load_model("qwen25_3b_instruct")``)
- ``model_family``: Which model family handles this model (``qwen``)
- ``model_arch``: Which architecture configuration to use (``qwen25_3b``)
- ``checkpoint``: Where to download the model weights from
- ``tokenizer``: Where to download the tokenizer from
- ``tokenizer_family``: Which tokenizer family to use
- ``tokenizer_config``: Tokenizer-specific settings

For more details on asset card options, see :ref:`basics-assets`.

Step 3: Verify the Integration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After adding the configuration and asset card, verify that your model is properly registered:

1. **Check if model is recognized:**

   .. code-block:: bash

       # List all models to see if yours appears
       python -m fairseq2.assets list --kind model

       # Look specifically for your model
       python -m fairseq2.assets list --kind model | grep qwen25_3b_instruct

2. **Test model loading:**

   .. code-block:: python

       import fairseq2
       from fairseq2.models.hub import load_model

       # Test loading your model
       try:
           model = load_model("qwen25_3b_instruct")
           print(f"✓ Success! Loaded model with {sum(p.numel() for p in model.parameters())} parameters")
       except Exception as e:
           print(f"✗ Error: {e}")

3. **Inspect model metadata:**

   .. code-block:: bash

       # Show detailed model information
       python -m fairseq2.assets show qwen25_3b_instruct

Asset Source Options
--------------------

fairseq2 supports multiple sources for model checkpoints and tokenizers:

Hugging Face Hub (Recommended)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Most common and convenient option:

.. code-block:: yaml

    checkpoint: "hg://qwen/qwen2.5-3b-instruct"
    tokenizer: "hg://qwen/qwen2.5-3b-instruct"

Note that only safetensors are supported for checkpoints.

Local Files
^^^^^^^^^^^

For development or custom models:

.. code-block:: yaml

    checkpoint: "file:///path/to/my/model.pt"
    tokenizer: "file:///path/to/my/tokenizer"

HTTP URLs
^^^^^^^^^

Direct download links:

.. code-block:: yaml

    checkpoint: "https://example.com/models/my_model.pt"


Common Model Parameters
-----------------------

When creating new architecture configurations, here are the most common parameter naming conventions you'll find in fairseq2 (it may vary depending on the model architecture):

Core Architecture
^^^^^^^^^^^^^^^^^

.. code-block:: python

    config.model_dim = 2048           # Model dimensionality
    config.num_layers = 36            # Number of transformer layers
    config.num_attn_heads = 16        # Number of attention heads
    config.num_key_value_heads = 2    # Key/value heads (for GQA/MQA)
    config.ffn_inner_dim = 11_008     # Feed-forward network inner dimension

Vocabulary & Sequence
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

    config.vocab_size = 151_936       # Vocabulary size
    config.max_seq_len = 32_768       # Maximum sequence length
    config.tied_embeddings = True     # Tie input/output embeddings

Training & Architecture Details
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

    config.head_dim = 128             # Attention head dimension (optional)
    config.qkv_proj_bias = False      # Query/key/value projection bias
    config.dropout_p = 0.0            # Dropout probability
    config.rope_theta = 1_000_000.0   # RoPE theta parameter

Troubleshooting
---------------

Model Not Found Error
^^^^^^^^^^^^^^^^^^^^^

If you get ``ModelNotKnownError``:

1. **Check asset card syntax:** Ensure your YAML is valid
2. **Verify names match:** Asset card ``name`` should match what you're requesting
3. **Check architecture registration:** Ensure the ``@arch`` decorated function exists
4. **Restart Python:** Changes to config files require restarting your Python session

Architecture Configuration Error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you get architecture-related errors:

1. **Verify decorator:** Ensure ``@arch("name")`` is properly applied
2. **Check architecture name:** Asset card ``model_arch`` must match the registered name
3. **Validate parameters:** Ensure all required config parameters are set

Download/Loading Errors
^^^^^^^^^^^^^^^^^^^^^^^

If model download or loading fails:

1. **Check URLs:** Verify checkpoint and tokenizer URLs are accessible
2. **Test connectivity:** Ensure you have internet access and proper authentication
3. **Check file paths:** For local files, verify paths exist and are readable
4. **Validate checkpoint format:** Ensure checkpoint is compatible with the model family

Configuration Validation Errors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you get validation errors:

1. **Check parameter types:** Ensure integers are integers, strings are strings, etc.
2. **Validate ranges:** Some parameters may have valid ranges (e.g., positive integers)
3. **Review dependencies:** Some parameters may depend on others (e.g., head dimensions)


Example: Complete Implementation
--------------------------------

Here's a complete example showing all the files you need to modify to add ``qwen25_3b_instruct``:

**1. Architecture Configuration** (``src/fairseq2/models/qwen/config.py``):

.. code-block:: python

    @arch("qwen25_3b_instruct")
    def qwen25_3b_instruct() -> QwenConfig:
        """Qwen2.5-3B-Instruct: Language model with 3B parameters.

        Paper: https://arxiv.org/abs/2024.xxxxx
        """
        config = QwenConfig()

        config.model_dim = 2048
        ...

        return config

**2. Asset Card** (``src/fairseq2/assets/cards/models/qwen.yaml``):

.. code-block:: yaml

    ---

    name: qwen25_3b_instruct
    model_family: qwen
    model_arch: qwen25_3b
    checkpoint: "hg://qwen/qwen2.5-3b-instruct"
    tokenizer: "hg://qwen/qwen2.5-3b-instruct"
    tokenizer_family: qwen
    tokenizer_config:
      use_im_end: true

**3. Command Line Verification**:

.. code-block:: bash

    # Check model is listed
    python -m fairseq2.assets list --kind model | grep qwen25_3b_instruct

    # Show model details
    python -m fairseq2.assets show qwen25_3b_instruct

    # Quick load test
    python -c "
    from fairseq2.models.hub import load_model
    model = load_model('qwen25_3b_instruct')
    print('✓ Success!')
    "

This complete example shows all the steps needed to add a new model to fairseq2.
The process is straightforward but requires attention to detail to ensure all components work together correctly.