.. _api-models-llama: ===================== fairseq2.models.llama ===================== .. currentmodule:: fairseq2.models.llama The LLaMA module provides support for LLaMA language models from Meta AI. It includes model configurations, hub access, tokenizers, and utilities for loading and working with LLaMA models. Quick Start ----------- .. code-block:: python from fairseq2.models.llama import get_llama_model_hub, get_llama_tokenizer_hub # Get the model hub hub = get_llama_model_hub() # Load a model model = hub.load_model("llama3_2_1b") # Load corresponding tokenizer (uses HuggingFace tokenizer by default) tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b") # Generate some text text = "The future of AI is" encoder = tokenizer.create_encoder() encoded = encoder(text) # ... model inference code ... Tokenizer --------- LLaMA tokenizer in fairseq2 supports multiple implementations: 1. **HuggingFace Tokenizer (Default)**: The default and recommended implementation using HuggingFace's tokenizer. Asset Card Example: .. code-block:: yaml name: llama3 tokenizer: "/path/to/Llama-3.1-8B" # HuggingFace tokenizer directory tokenizer_family: llama Directory structure should contain e.g. ``config.json``, ``tokenizer.json``, ``tokenizer_config.json``, ``special_tokens_map.json``. 2. **Tiktoken Implementation**: Implementation using Tiktoken. Asset Card Example: .. code-block:: yaml name: tiktoken_llama_instruct tokenizer_config_override: impl: tiktoken use_eot: True # For instruction models tokenizer_family: llama tokenizer: "/path/to/tokenizer.model" # Tiktoken model file 3. **SentencePiece Implementation**: Implementation using SentencePiece (only available for LLaMA-1 and LLaMA-2). Asset Card Example: .. code-block:: yaml name: sp_llama tokenizer_config_override: impl: sp tokenizer_family: llama tokenizer: "/path/to/tokenizer.model" # SentencePiece model file Special Tokens ~~~~~~~~~~~~~~ The tokenizer handles several special tokens: - ``<|begin_of_text|>`` - Beginning of text marker - ``<|end_of_text|>`` - End of text marker (default) - ``<|eot_id|>`` - End of turn marker (when ``use_eot=True``) - ``<|start_header_id|>`` - Start of header - ``<|end_header_id|>`` - End of header For instruction models (e.g., ``llama3_2_1b_instruct``), ``use_eot=True`` is set by default, which means: .. code-block:: python from fairseq2.data.tokenizers import load_tokenizer # Load instruct model tokenizer tokenizer = load_tokenizer("llama3_2_1b_instruct") # Will use <|eot_id|> as EOS token assert tokenizer._eos_token == "<|eot_id|>" Tokenizer Modes ~~~~~~~~~~~~~~~ The tokenizer supports different modes via ``create_encoder(mode=...)``: - ``default``: Adds BOS and EOS tokens - ``prompt``: Adds BOS token only - ``prompt_response``: Adds EOS token only - ``as_is``: No special tokens added .. code-block:: python encoder = tokenizer.create_encoder(mode="prompt") # Only adds <|begin_of_text|> encoder = tokenizer.create_encoder(mode="prompt_response") # Only adds <|eot_id|> or <|end_of_text|> Model Hub --------- get_llama_model_hub ~~~~~~~~~~~~~~~~~~~ .. autofunction:: get_llama_model_hub Returns the model hub for LLaMA models. get_llama_tokenizer_hub ~~~~~~~~~~~~~~~~~~~~~~~ .. autofunction:: get_llama_tokenizer_hub Returns the tokenizer hub for LLaMA tokenizers. Model Configuration ------------------- LLaMAConfig ~~~~~~~~~~~ .. autoclass:: LLaMAConfig :members: :show-inheritance: Tokenizer Configuration ----------------------- LLaMATokenizerConfig ~~~~~~~~~~~~~~~~~~~~ .. autoclass:: LLaMATokenizerConfig :members: :show-inheritance: Configuration for LLaMA tokenizer. **Key Parameters:** * ``impl`` - Implementation to use: "hg" (default), "tiktoken", or "sp" * ``use_eot`` - Whether to use ``<|eot_id|>`` as EOS token (True for instruction models) * ``split_regex`` - Custom regex pattern for tiktoken implementation Complete Examples ----------------- Using HuggingFace Tokenizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from fairseq2.models.llama import get_llama_tokenizer_hub # Load default HuggingFace tokenizer tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b") # Create encoder in different modes default_encoder = tokenizer.create_encoder() # Adds BOS and EOS prompt_encoder = tokenizer.create_encoder(mode="prompt") # Only BOS # Encode text text = "Hello, world!" tokens = default_encoder(text) Using Tiktoken Implementation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from fairseq2.models.llama import get_llama_tokenizer_hub from fairseq2.models.llama.tokenizer import LLaMATokenizerConfig from pathlib import Path # Configure tiktoken implementation config = LLaMATokenizerConfig(impl="tiktoken", use_eot=True) # Load tokenizer with custom config hub = get_llama_tokenizer_hub() tokenizer = hub.load_custom_tokenizer(Path("/path/to/tokenizer.model"), config) Chat Template Support ~~~~~~~~~~~~~~~~~~~~~ The HuggingFace implementation includes support for chat templates through the HuggingFace tokenizer's ``apply_chat_template`` method: .. code-block:: python from fairseq2.models.llama import get_llama_tokenizer_hub # Load tokenizer tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b") # Prepare chat messages messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a joke."}, {"role": "assistant", "content": "Why did the chicken cross the road?"} ] # Format using chat template formatted_text = tokenizer._model._tok.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Then encode the formatted text encoder = tokenizer.create_encoder() tokens = encoder(formatted_text) See Also -------- * :doc:`/reference/api/models/hub` - Model hub API reference * :doc:`/tutorials/add_model` - Tutorial on adding new models * :doc:`/basics/assets` - Understanding the asset system