fairseq2.models¶
Classes¶
- final class fairseq2.models.ModelHub(family: ModelFamily, asset_store: AssetStore)[source]¶
Bases:
Generic
[ModelT
,ModelConfigT
]Provides a high-level interface for loading and creating models from a specific model family.
This class serves as the primary entry point for working with models of a particular family (e.g., LLaMA, Qwen, etc.). It handles model discovery, configuration loading, and model instantiation.
- iter_cards() Iterator[AssetCard] [source]¶
Iterates over all asset cards belonging to this model family.
from fairseq2.models.qwen import get_qwen_model_hub # List all available Qwen models. for card in get_qwen_model_hub().iter_cards(): print(f"Model: {card.name}")
- get_archs() set[str] [source]¶
Returns the set of supported model architectures in this family.
from fairseq2.models.qwen import get_qwen_model_hub # List all available Qwen architectures. for arch in get_qwen_model_hub().get_archs(): print(f"Architecture: {arch}")
- get_arch_config(arch: str) ModelConfigT [source]¶
Returns the configuration for the specified model architecture.
from fairseq2.models.qwen import get_qwen_model_hub config = get_qwen_model_hub().get_arch_config("qwen25_7b") print(config)
- Raises:
ModelArchitectureNotKnownError – If
arch
is not a known architecture in this family.
- maybe_get_arch_config(arch: str) ModelConfigT | None [source]¶
Returns the configuration for the specified model architecture, or
None
if not known.
- get_model_config(card: AssetCard | str) ModelConfigT [source]¶
Returns the model configuration from an asset card.
This method loads the base architecture configuration and applies any model-specific overrides specified in the asset card.
As a convenience, this method also accepts an asset name instead of an asset card.
from fairseq2.assets import get_asset_store from fairseq2.models.qwen import QwenConfig, get_qwen_model_hub card = get_asset_store().retrieve_card("qwen25_7b_instruct") qwen_config = get_qwen_model_hub().get_model_config(card) # As a convenience, the card can be omitted and the model name can # be passed directly to `get_model_config()`: qwen_config = get_qwen_model_hub().get_model_config("qwen25_7b_instruct") print(qwen_config)
- Raises:
ModelNotKnownError – If
card
is a string and no asset card with that name exists.AssetCardError – If the asset card’s model family does not match this hub’s family.
- create_new_model(config: ModelConfigT, *, gangs: Gangs | None = None, dtype: dtype | None = None, meta: bool = False) ModelT [source]¶
- create_new_model(config: ModelConfigT, *, device: device | None = None, dtype: dtype | None = None, meta: bool = False) ModelT
Creates a new model instance with the specified configuration.
This method creates a fresh model without loading any pretrained weights. The model will be initialized with random parameters according to the architecture’s default initialization scheme.
If
gangs
is provided, it will be used to apply parallelism (i.e. model parallelism) to the initialized model. If the model family does not support a certain parallelism strategy, that strategy will be ignored. For instance ifgangs.tp.size > 1
, but the model does not support tensor parallelism, the model will be instantiated with regular attention and feed-forward network blocks. IfNone
, the whole model will be initialized without any parallelism.If
device
is provided, the model will be created on the specified device; otherwise, the device returned fromtorch.get_default_device()
will be used. Note thatdevice
andgangs
cannot be provided together. Ifgangs
is provided,gangs.root.device
will be used.If
dtype
is provided, it will be used as the default data type of the model parameters and buffers; otherwise, the data type returned fromtorch.get_default_dtype()
will be used.If
meta
isTrue
, the model will be created on the meta device for memory-efficient initialization. Only supported if the model family supports meta device.from fairseq2.models.qwen import QwenConfig, get_qwen_model_hub # Use the default Qwen configuration except the number of # decoder layers. config = QwenConfig(num_layers=16) qwen_model = get_qwen_model_hub().create_new_model(config)
- Raises:
ValueError – If both
gangs
anddevice
are provided.NotSupportedError – If
meta
isTrue
but the model family doesn’t support meta device.
- load_model(card: AssetCard | str, *, gangs: Gangs | None = None, dtype: dtype | None = None, config: ModelConfigT | None = None, mmap: bool = False, progress: bool = True) ModelT [source]¶
- load_model(card: AssetCard | str, *, device: device | None = None, dtype: dtype | None = None, config: ModelConfigT | None = None, mmap: bool = False, progress: bool = True) ModelT
Loads a pretrained model from an asset card.
This method downloads the model checkpoint (if necessary) and loads the pretrained weights into a model instance. The model architecture and configuration are determined from the asset card metadata.
As a convenience, this method also accepts an asset name instead of an asset card.
If
gangs
is provided, it will be used to apply parallelism (i.e. model parallelism) to the initialized model. If the model family does not support a certain parallelism strategy, that strategy will be ignored. For instance ifgangs.tp.size > 1
, but the model does not support tensor parallelism, the model will be instantiated with regular attention and feed-forward network blocks. IfNone
, the whole model will be initialized without any parallelism.If
device
is provided, the model will be created on the specified device; otherwise, the device returned fromtorch.get_default_device()
will be used. Note thatdevice
andgangs
cannot be provided together. Ifgangs
is provided,gangs.root.device
will be used.If
dtype
is provided, it will be used as the default data type of the model parameters and buffers; otherwise, the data type returned fromtorch.get_default_dtype()
will be used.If
config
is provided, it overrides the default model configuration from the asset card. IfNone
, uses the configuration specified in the card. Typically used to perform slight adjustments to the model configuration such as tuning dropout probabilities without changing the architecture.If
mmap
isTrue
, the model checkpoint will be memory-mapped. This can reduce memory usage but may cause slower load times on some systems.If
progress
isTrue
, displays a progress bar during model download and loading.from fairseq2.assets import get_asset_store from fairseq2.models.qwen import QwenConfig, get_qwen_model_hub card = get_asset_store().retrieve_card("qwen25_7b_instruct") qwen_model = get_qwen_model_hub().load_model(card) # As a convenience, the card can be omitted and the model name can # be passed directly to `load_model()`: qwen_model = get_qwen_model_hub().load_model("qwen25_7b_instruct")
- Raises:
ModelNotKnownError – If
card
is a string and no asset card with that name exists.AssetCardError – If the asset card’s model family doesn’t match this hub’s family.
ValueError – If both
gangs
anddevice
are provided.
- load_custom_model(path: Path, config: ModelConfigT, *, gangs: Gangs | None = None, dtype: dtype | None = None, mmap: bool = False, restrict: bool | None = None, progress: bool = True) ModelT [source]¶
- load_custom_model(path: Path, config: ModelConfigT, *, device: device | None = None, dtype: dtype | None = None, mmap: bool = False, restrict: bool | None = None, progress: bool = True) ModelT
Loads a model from a custom checkpoint file.
This method is useful for loading models from custom training runs or third-party checkpoints that are not available through the asset store.
config
specifies the model configuration. It must match the architecture of the saved checkpoint.If
gangs
is provided, it will be used to apply parallelism (i.e. model parallelism) to the initialized model. If the model family does not support a certain parallelism strategy, that strategy will be ignored. For instance ifgangs.tp.size > 1
, but the model does not support tensor parallelism, the model will be instantiated with regular attention and feed-forward network blocks. IfNone
, the whole model will be initialized without any parallelism.If
device
is provided, the model will be created on the specified device; otherwise, the device returned fromtorch.get_default_device()
will be used. Note thatdevice
andgangs
cannot be provided together. Ifgangs
is provided,gangs.root.device
will be used.If
dtype
is provided, it will be used as the default data type of the model parameters and buffers; otherwise, the data type returned fromtorch.get_default_dtype()
will be used.If
mmap
isTrue
, the model checkpoint will be memory-mapped. This can reduce memory usage but may cause slower load times on some systems.If
restrict
isTrue
, pickle (if used) will be restricted to load only tensors and types that can be safely serialized and deserialized. IfNone
, the default restriction setting of the family will be used.If
progress
isTrue
, displays a progress bar during model download and loading.from fairseq2.models.qwen import QwenConfig, get_qwen_model_hub checkpoint_path = ... # The checkpoint contains a Qwen model with 16 decoder layers. config = QwenConfig(num_layers=16) qwen_model = get_qwen_model_hub().load_custom_model(checkpoint_path, config)
- Raises:
ValueError – If both
gangs
anddevice
are provided.FileNotFoundError – If the checkpoint file does not exist.
ModelCheckpointError – If the checkpoint format is not valid or incompatible with the model.
- iter_checkpoint(path: Path, config: ModelConfigT, *, gangs: Gangs | None = None, mmap: bool = False, restrict: bool | None = None) Iterator[tuple[str, Tensor]] [source]¶
Lazily loads parameters from the specified model checkpoint path.
Yields tensors one at a time to minimize memory usage if the underlying checkpoint format allows it.
This method provides low-level access to checkpoint contents without loading the full model into memory. It’s useful for checkpoint inspection, custom loading logic, or memory-efficient parameter processing.
config
specifies the model configuration used to determine the expected parameter structure in the checkpoint.If
gangs
is provided, it is used to determine the distributed target configuration and to shard yielded parameters accordingly. IfNone
, no sharding will be performed and full parameters will be yielded.If
mmap
isTrue
, the checkpoint will be memory-mapped. This can reduce memory usage but may cause slower load times on some systems.If
restrict
isTrue
, pickle (if used) will be restricted to load only tensors and types that can be safely serialized and deserialized. IfNone
, the default restriction setting of the family will be used.Yields pairs of
(parameter name, parameter)
for each parameter in the checkpoint.- Raises:
FileNotFoundError – If the checkpoint file does not exist.
ModelCheckpointError – If the checkpoint format is not valid.
- final class fairseq2.models.ModelHubAccessor(family_name: str, kls: type[ModelT], config_kls: type[ModelConfigT])[source]¶
Bases:
Generic
[ModelT
,ModelConfigT
]Creates a
ModelHub
instance when called.This class provides a strongly-typed way to access model hubs. Its direct use is meant for model authors rather than library users.
See
src/fairseq2/models/llama/hub.py
as an example.The use of ModelHubAccessor for model authors¶from fairseq2.models import ModelHubAccessor # Defined in the Python module where the model is implemented. get_my_model_hub = ModelHubAccessor( family_name="my_model_family", kls=MyModel, config_kls=MyModelConfig ) # `get_my_model_hub()` is treated as a standalone function by the model # users in other parts of the code like below: model_config = MyModelConfig() model = get_my_model_hub().create_new_model(model_config)
Functions¶
- fairseq2.models.load_model(card: AssetCard | str, *, gangs: Gangs | None = None, dtype: dtype | None = None, config: object = None, mmap: bool = False, progress: bool = True) Module [source]¶
- fairseq2.models.load_model(card: AssetCard | str, *, device: device | None = None, dtype: dtype | None = None, config: object = None, mmap: bool = False, progress: bool = True) Module
Loads a pretrained model from an asset card.
This function downloads the model checkpoint (if necessary) and loads the pretrained weights into a model instance. The model architecture and configuration are determined from the asset card metadata.
As a convenience, this method also accepts an asset name instead of an asset card.
The difference between
load_model
andModelHub.load_model()
is as follows:load_model
provides a unified interface for loading models across all model families. It determines the appropriate model family based on asset card metadata and delegates to the family-specific loading logic.The tradeoff is that (1) the
config
parameter ofload_model
is not type-safe, (2) it is possible to accidentally load an unintended model since the function is not constrained to a specific family.The general recommendation is to use
ModelHub.load_model()
if the model family is known in advance, and to useload_model
if the decision about the model and its family needs to be made at runtime.
If
gangs
is provided, it will be used to apply parallelism (i.e. model parallelism) to the initialized model. If the model family does not support a certain parallelism strategy, that strategy will be ignored. For instance ifgangs.tp.size > 1
, but the model does not support tensor parallelism, the model will be instantiated with regular attention and feed-forward network blocks. IfNone
, the whole model will be initialized without any parallelism.If
device
is provided, the model will be created on the specified device; otherwise, the device returned fromtorch.get_default_device()
will be used. Note thatdevice
andgangs
cannot be provided together. Ifgangs
is provided,gangs.root.device
will be used.If
dtype
is provided, it will be used as the default data type of the model parameters and buffers; otherwise, the data type returned fromtorch.get_default_dtype()
will be used.If
config
is provided, it overrides the default model configuration from the asset card. IfNone
, uses the configuration specified in the card. Typically used to perform slight adjustments to the model configuration such as tuning dropout probabilities without changing the architecture.If
mmap
isTrue
, the model checkpoint will be memory-mapped. This can reduce memory usage but may cause slower load times on some systems.If
progress
isTrue
, displays a progress bar during model download and loading.from fairseq2.assets import get_asset_store from fairseq2.models.qwen import load_model card = get_asset_store().retrieve_card("qwen25_7b_instruct") qwen_model = load_model(card) # As a convenience, the card can be omitted and the model name can # be passed directly to `load_model()`: wav2vec2_model = load_model("wav2vec2_asr_base_10h")
- Raises:
ModelNotKnownError – If
card
is a string and no asset card with that name exists.AssetCardError – If the asset card’s model family doesn’t match this hub’s family.
ValueError – If both
gangs
anddevice
are provided.
Exceptions¶
- class fairseq2.models.ModelNotKnownError(name: str)[source]¶
Bases:
Exception
Raised when a requested model name is not found in the asset store.
- class fairseq2.models.ModelFamilyNotKnownError(name: str)[source]¶
Bases:
Exception
Raised when a requested model family is not registered.
- class fairseq2.models.ModelArchitectureNotKnownError(arch: str, family: str | None = None)[source]¶
Bases:
Exception
Raised when a requested model architecture is not supported by a model family.
family
defaults toNone
due to backwards-compatibility. New code must specify a model family when raising this error.