fairseq2.models.llama¶
The LLaMA module provides support for LLaMA language models from Meta AI. It includes model configurations, hub access, tokenizers, and utilities for loading and working with LLaMA models.
Quick Start¶
from fairseq2.models.llama import get_llama_model_hub, get_llama_tokenizer_hub
# Get the model hub
hub = get_llama_model_hub()
# Load a model
model = hub.load_model("llama3_2_1b")
# Load corresponding tokenizer (uses HuggingFace tokenizer by default)
tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b")
# Generate some text
text = "The future of AI is"
encoder = tokenizer.create_encoder()
encoded = encoder(text)
# ... model inference code ...
Tokenizer¶
LLaMA tokenizer in fairseq2 supports multiple implementations:
- HuggingFace Tokenizer (Default):
The default and recommended implementation using HuggingFace’s tokenizer.
Asset Card Example:
name: llama3 tokenizer: "/path/to/Llama-3.1-8B" # HuggingFace tokenizer directory tokenizer_family: llama
Directory structure should contain e.g.
config.json
,tokenizer.json
,tokenizer_config.json
,special_tokens_map.json
.
- Tiktoken Implementation:
Implementation using Tiktoken.
Asset Card Example:
name: tiktoken_llama_instruct tokenizer_config_override: impl: tiktoken use_eot: True # For instruction models tokenizer_family: llama tokenizer: "/path/to/tokenizer.model" # Tiktoken model file
- SentencePiece Implementation:
Implementation using SentencePiece (only available for LLaMA-1 and LLaMA-2).
Asset Card Example:
name: sp_llama tokenizer_config_override: impl: sp tokenizer_family: llama tokenizer: "/path/to/tokenizer.model" # SentencePiece model file
Special Tokens¶
The tokenizer handles several special tokens:
<|begin_of_text|>
- Beginning of text marker<|end_of_text|>
- End of text marker (default)<|eot_id|>
- End of turn marker (whenuse_eot=True
)<|start_header_id|>
- Start of header<|end_header_id|>
- End of header
For instruction models (e.g., llama3_2_1b_instruct
), use_eot=True
is set by default, which means:
from fairseq2.data.tokenizers import load_tokenizer
# Load instruct model tokenizer
tokenizer = load_tokenizer("llama3_2_1b_instruct")
# Will use <|eot_id|> as EOS token
assert tokenizer._eos_token == "<|eot_id|>"
Tokenizer Modes¶
The tokenizer supports different modes via create_encoder(mode=...)
:
default
: Adds BOS and EOS tokensprompt
: Adds BOS token onlyprompt_response
: Adds EOS token onlyas_is
: No special tokens added
encoder = tokenizer.create_encoder(mode="prompt")
# Only adds <|begin_of_text|>
encoder = tokenizer.create_encoder(mode="prompt_response")
# Only adds <|eot_id|> or <|end_of_text|>
Model Hub¶
get_llama_model_hub¶
get_llama_tokenizer_hub¶
- fairseq2.models.llama.get_llama_tokenizer_hub()¶
Returns the tokenizer hub for LLaMA tokenizers.
- Return type:
TokenizerHub[TokenizerT, TokenizerConfigT]
Model Configuration¶
LLaMAConfig¶
- class fairseq2.models.llama.LLaMAConfig(*, model_dim=4096, max_seq_len=2048, vocab_size=32000, pad_idx=None, tied_embeddings=False, num_layers=32, num_attn_heads=32, num_key_value_heads=32, ffn_inner_dim=16384, ffn_inner_dim_scale=0.6666666666666666, ffn_inner_dim_multiplier=1.0, ffn_inner_dim_multiple_of=256, rope_theta=10000.0, use_scaled_rope=False, rope_scale=<factory>, dropout_p=0.0, init_std=None, init_std_scale='layer', shard_embed_dim=True)[source]¶
Bases:
object
Holds the configuration of a LLaMA model.
The default values correspond to the base architecture as described in Touvron et al. [4].
- ffn_inner_dim_scale: float = 0.6666666666666666¶
The scale factor for the dimensionality of inner projection layers in feed-forward networks.
- ffn_inner_dim_multiplier: float = 1.0¶
The multiplier for the dimensionality of inner projection layers in feed-forward networks.
- ffn_inner_dim_multiple_of: int = 256¶
The dimensionality of inner projection layers in feed-forward networks is rounded up to the nearest multiple of this value.
- rope_scale: LLaMARoPEScaleConfig¶
If not
None
, specifies scaling parameters for the Rotary position encoder, aiming to increase the resolver length.
- init_std: float | None = None¶
If not
None
, the standard deviation to initialize input embeddings and projection weights; otherwise,model_dim ** -0.5
will be used instead.
Tokenizer Configuration¶
LLaMATokenizerConfig¶
- class fairseq2.models.llama.LLaMATokenizerConfig(impl: "Literal['sp', 'tiktoken', 'hg']" = 'sp', use_eot: 'bool' = False, split_regex: 'str | None' = None)[source]¶
Bases:
object
Configuration for LLaMA tokenizer.
Key Parameters:
impl
- Implementation to use: “hg” (default), “tiktoken”, or “sp”use_eot
- Whether to use<|eot_id|>
as EOS token (True for instruction models)split_regex
- Custom regex pattern for tiktoken implementation
Complete Examples¶
Using HuggingFace Tokenizer¶
from fairseq2.models.llama import get_llama_tokenizer_hub
# Load default HuggingFace tokenizer
tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b")
# Create encoder in different modes
default_encoder = tokenizer.create_encoder() # Adds BOS and EOS
prompt_encoder = tokenizer.create_encoder(mode="prompt") # Only BOS
# Encode text
text = "Hello, world!"
tokens = default_encoder(text)
Using Tiktoken Implementation¶
from fairseq2.models.llama import get_llama_tokenizer_hub
from fairseq2.models.llama.tokenizer import LLaMATokenizerConfig
from pathlib import Path
# Configure tiktoken implementation
config = LLaMATokenizerConfig(impl="tiktoken", use_eot=True)
# Load tokenizer with custom config
hub = get_llama_tokenizer_hub()
tokenizer = hub.load_custom_tokenizer(Path("/path/to/tokenizer.model"), config)
Chat Template Support¶
The HuggingFace implementation includes support for chat templates through the HuggingFace tokenizer’s apply_chat_template
method:
from fairseq2.models.llama import get_llama_tokenizer_hub
# Load tokenizer
tokenizer = get_llama_tokenizer_hub().load_tokenizer("llama3_2_1b")
# Prepare chat messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."},
{"role": "assistant", "content": "Why did the chicken cross the road?"}
]
# Format using chat template
formatted_text = tokenizer._model._tok.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Then encode the formatted text
encoder = tokenizer.create_encoder()
tokens = encoder(formatted_text)
See Also¶
fairseq2.models.hub - Model hub API reference
Add Your Own Model - Tutorial on adding new models
Assets - Understanding the asset system