fairseq2.models.hg.config

Configuration classes for HuggingFace model integration.

Functions

register_hg_configs(container)

Register predefined HuggingFace model configurations.

Classes

HuggingFaceModelConfig(*, hf_name[, ...])

Configuration for loading HuggingFace models.

class fairseq2.models.hg.config.HuggingFaceModelConfig(*, hf_name: str, model_type: str = 'auto', use_processor: bool = False, device: str = 'cpu', custom_model_class: str | None = None, custom_processor_class: str | None = None, trust_remote_code: bool = False, dtype: dtype | None = None, load_kwargs: dict[str, Any] | None = None, enable_gradient_checkpointing: bool = False)[source]

Bases: object

Configuration for loading HuggingFace models.

This dataclass contains all the parameters needed to configure how a HuggingFace model should be loaded, including device placement, dtype, custom classes, and special loading options.

Parameters:
  • hf_name – The HuggingFace model identifier (e.g., ‘gpt2’)

  • model_type – Type of AutoModel (‘auto’, ‘causal_lm’, ‘seq2seq_lm’, ‘custom’)

  • use_processor – Whether to use AutoProcessor for multimodal models

  • device – Device placement (‘cpu’, ‘cuda:0’, or ‘auto’)

  • custom_model_class – Custom model class name for special cases

  • custom_processor_class – Custom processor class name for special cases

  • trust_remote_code – Whether to trust remote code for custom architectures

  • dtype – PyTorch dtype to use. None means ‘auto’ (let HuggingFace decide)

  • load_kwargs – Additional kwargs to pass to from_pretrained

  • enable_gradient_checkpointing – Whether to enable gradient checkpointing to reduce memory usage during training (only for causal_lm models)

Example:

Create a configuration for GPT-2:

config = HuggingFaceModelConfig(
    hf_name="gpt2",
    model_type="causal_lm",
    device="cuda:0"
)
hf_name: str

The HuggingFace model identifier (e.g., ‘gpt2’).

model_type: str = 'auto'

Type of AutoModel (‘auto’, ‘causal_lm’, ‘seq2seq_lm’, ‘custom’).

use_processor: bool = False

Whether to use AutoProcessor for multimodal models.

device: str = 'cpu'

Device placement: ‘cpu’, ‘cuda:0’, or ‘auto’ for HF accelerate.

custom_model_class: str | None = None

Custom model class name for special cases.

custom_processor_class: str | None = None

Custom processor class name for special cases.

trust_remote_code: bool = False

Whether to trust remote code for custom architectures.

dtype: dtype | None = None

PyTorch dtype to use. None means ‘auto’ (let HuggingFace decide).

load_kwargs: dict[str, Any] | None = None

Additional kwargs to pass to from_pretrained.

enable_gradient_checkpointing: bool = False

Whether to enable gradient checkpointing to reduce memory usage (causal_lm only).

fairseq2.models.hg.config.register_hg_configs(container: DependencyContainer) None[source]

Register predefined HuggingFace model configurations.

Parameters:

container – The dependency container to register configurations with.