fairseq2.models.hg.config¶

Configuration classes for HuggingFace model integration.

Functions

register_hg_configs(container)

Classes

HuggingFaceModelConfig(*, hf_name[, ...])

Configuration for loading HuggingFace models.

class fairseq2.models.hg.config.HuggingFaceModelConfig(*, hf_name: str, model_type: str = 'auto', use_processor: bool = False, device: str = 'cpu', custom_model_class: str | None = None, custom_processor_class: str | None = None, trust_remote_code: bool = False, dtype: dtype | None = None, load_kwargs: dict[str, Any] | None = None, enable_gradient_checkpointing: bool = False)[source]¶

Bases: object

Configuration for loading HuggingFace models.

This dataclass contains all the parameters needed to configure how a HuggingFace model should be loaded, including device placement, dtype, custom classes, and special loading options.

Parameters:

hf_name – The HuggingFace model identifier (e.g., ‘gpt2’)
model_type – Type of AutoModel (‘auto’, ‘causal_lm’, ‘seq2seq_lm’, ‘custom’)
use_processor – Whether to use AutoProcessor for multimodal models
device – Device placement (‘cpu’, ‘cuda:0’, or ‘auto’)
custom_model_class – Custom model class name for special cases
custom_processor_class – Custom processor class name for special cases
trust_remote_code – Whether to trust remote code for custom architectures
dtype – PyTorch dtype to use. None means ‘auto’ (let HuggingFace decide)
load_kwargs – Additional kwargs to pass to from_pretrained
enable_gradient_checkpointing – Whether to enable gradient checkpointing to reduce memory usage during training (only for causal_lm models)

Example:

Create a configuration for GPT-2:

config = HuggingFaceModelConfig(
    hf_name="gpt2",
    model_type="causal_lm",
    device="cuda:0"
)

hf_name: str¶: The HuggingFace model identifier (e.g., ‘gpt2’).

model_type: str = 'auto'¶: Type of AutoModel (‘auto’, ‘causal_lm’, ‘seq2seq_lm’, ‘custom’).

use_processor: bool = False¶: Whether to use AutoProcessor for multimodal models.

device: str = 'cpu'¶: Device placement: ‘cpu’, ‘cuda:0’, or ‘auto’ for HF accelerate.

custom_model_class: str | None = None¶: Custom model class name for special cases.

custom_processor_class: str | None = None¶: Custom processor class name for special cases.

trust_remote_code: bool = False¶: Whether to trust remote code for custom architectures.

dtype: dtype | None = None¶: PyTorch dtype to use. None means ‘auto’ (let HuggingFace decide).

load_kwargs: dict[str, Any] | None = None¶: Additional kwargs to pass to from_pretrained.

enable_gradient_checkpointing: bool = False¶: Whether to enable gradient checkpointing to reduce memory usage (causal_lm only).

fairseq2.models.hg.config.register_hg_configs(container: DependencyContainer) → None[source]¶

Parameters:: container – The dependency container to register configurations with.