fairseq2.recipes.lm.instruction_finetune¶
classDiagram ABC <|-- CliCommandHandler CliCommandHandler <|-- ChatbotCommandHandler CliCommandHandler <|-- RecipeCommandHandler Generic <|-- RecipeCommandHandler
Classes¶
- class fairseq2.recipes.lm.instruction_finetune.InstructionFinetuneConfig(*, dataset='foo', train_split='default', valid_split=None, max_seq_len=8192, max_num_tokens=16384, batch_size=None, max_num_valid_tokens=None, example_shuffle_window=10000, batch_shuffle_window=1000, num_prefetch=4, src_encode_mode='prompt', tgt_encode_mode='prompt_response', model='llama3_1_8b_instruct', model_config=None, dtype=torch.bfloat16, mixed_precision='static', data_parallelism='fsdp', fsdp_local_world_size=None, fsdp_wrap_granularity='layer', fsdp_reshard_after_forward=True, tensor_parallel_size=1, activation_checkpointing=True, torch_compile=False, optimizer='adamw', optimizer_config=<factory>, lr_scheduler='cosine-annealing', lr_scheduler_config=<factory>, gradient_accumulation=1, max_gradient_norm=None, fp16_loss_scale=(128.0, 0.0001), max_num_steps=5000, max_num_data_epochs=None, validate_after_n_steps=0, validate_every_n_steps=100, checkpoint_every_n_steps=1000, checkpoint_every_n_data_epochs=None, keep_last_n_checkpoints=1, keep_last_n_models=None, publish_metrics_every_n_steps=10, publish_metrics_every_n_data_epochs=None, resume_checkpoint_dir=None, seed=2, profile=None, monitored_gang=False, anomaly_detection=False, wandb_project=None, wandb_run_name=None)[source]¶
Bases:
object
Holds the configuration of a language model instruction-finetuning task.
- dataset: str | AssetCard | Path = 'foo'¶
The name, path, or path to the asset card of the instruction dataset.
- batch_size: int | None = None¶
If not
None
, ignores max_num_tokens and each batch will have batch_size examples.
- src_encode_mode: str = 'prompt'¶
The encode mode for the prompt, determines what special tokens to add.
- tgt_encode_mode: str = 'prompt_response'¶
The encode mode for the target, determines what special tokens to add.
- model: str | AssetCard | Path = 'llama3_1_8b_instruct'¶
The name or path to the asset card of the language model to finetune.
- model_config: Any = None¶
The model configuration overrides. The provided values must be compatible with the checkpoint; otherwise, the model will fail to load.
- mixed_precision: Literal['none', 'static', 'dynamic'] = 'static'¶
If ‘none’, the whole training will be run in dtype. If ‘static’, forward and backward passes will be run in dtype, but the optimizer step will be run in full precision. If ‘dynamic’, forward and backward passes will be run with torch.amp in dtype, but the optimizer step will be run in full precision.
- fsdp_local_world_size: int | None = None¶
If not
None
, enables hybrid sharding. The model will be fully sharded within each worker group of sizelocal_world_size
and will be replicated across groups.
- fsdp_wrap_granularity: Literal['layer', 'stack', 'model'] = 'layer'¶
The granularity at which to wrap the model.
- fsdp_reshard_after_forward: bool = True¶
If
True
, reshards the parameters only after the backward pass.
- gradient_accumulation: int = 1¶
The number of steps to accumulate gradients before an optimizer update.
- max_gradient_norm: float | None = None¶
The maximum gradient norm. If
None
, no clipping will be applied.
- fp16_loss_scale: tuple[float, float] = (128.0, 0.0001)¶
The initial and minimum loss scale for fp16 training.
- max_num_steps: int = 5000¶
The maximum number of steps to train for. Note that max_num_steps is used as CosineLRScheduler argument!
- keep_last_n_checkpoints: int | None = 1¶
The number of checkpoints to keep. If
None
, none will be deleted.
- keep_last_n_models: int | None = None¶
The number of checkpoint models to keep. If
None
, none will be deleted.
- publish_metrics_every_n_data_epochs: int | None = None¶
The data epoch interval at which to publish training metrics.
- resume_checkpoint_dir: Path | None = None¶
If not
None
, adds the specified path to the default asset store.