fairseq2.recipes.lm.instruction_finetune¶

        classDiagram
  ABC <|-- CliCommandHandler
  CliCommandHandler <|-- ChatbotCommandHandler
  CliCommandHandler <|-- RecipeCommandHandler
  Generic <|-- RecipeCommandHandler

Classes¶

class fairseq2.recipes.lm.instruction_finetune.InstructionFinetuneConfig(*, dataset='foo', train_split='default', valid_split=None, max_seq_len=8192, max_num_tokens=16384, batch_size=None, max_num_valid_tokens=None, example_shuffle_window=10000, batch_shuffle_window=1000, num_prefetch=4, src_encode_mode='prompt', tgt_encode_mode='prompt_response', model='llama3_1_8b_instruct', model_config=None, dtype=torch.bfloat16, mixed_precision='static', data_parallelism='fsdp', fsdp_local_world_size=None, fsdp_wrap_granularity='layer', fsdp_reshard_after_forward=True, tensor_parallel_size=1, activation_checkpointing=True, torch_compile=False, optimizer='adamw', optimizer_config=<factory>, lr_scheduler='cosine-annealing', lr_scheduler_config=<factory>, gradient_accumulation=1, max_gradient_norm=None, fp16_loss_scale=(128.0, 0.0001), max_num_steps=5000, max_num_data_epochs=None, validate_after_n_steps=0, validate_every_n_steps=100, checkpoint_every_n_steps=1000, checkpoint_every_n_data_epochs=None, keep_last_n_checkpoints=1, keep_last_n_models=None, publish_metrics_every_n_steps=10, publish_metrics_every_n_data_epochs=None, resume_checkpoint_dir=None, seed=2, profile=None, monitored_gang=False, anomaly_detection=False, wandb_project=None, wandb_run_name=None)[source]¶

Bases: object

Holds the configuration of a language model instruction-finetuning task.

dataset: str | AssetCard | Path = 'foo'¶: The name, path, or path to the asset card of the instruction dataset.

train_split: str = 'default'¶: The name of the train data split.

valid_split: str | None = None¶: The name of the valid data split.

max_seq_len: int = 8192¶: The maximum sequence length.

max_num_tokens: int = 16384¶: The maximum number of tokens per batch.

batch_size: int | None = None¶: If not None, ignores max_num_tokens and each batch will have batch_size examples.

max_num_valid_tokens: int | None = None¶: The maximum number of tokens per validation batch.

example_shuffle_window: int = 10000¶: The size of the sliding window for shuffling examples.

batch_shuffle_window: int = 1000¶: The size of the sliding window for shuffling batches.

num_prefetch: int = 4¶: The number of batches to prefetch in background.

src_encode_mode: str = 'prompt'¶: The encode mode for the prompt, determines what special tokens to add.

tgt_encode_mode: str = 'prompt_response'¶: The encode mode for the target, determines what special tokens to add.

model: str | AssetCard | Path = 'llama3_1_8b_instruct'¶: The name or path to the asset card of the language model to finetune.

model_config: Any = None¶: The model configuration overrides. The provided values must be compatible with the checkpoint; otherwise, the model will fail to load.

dtype: dtype = torch.bfloat16¶: The data type of the model.

mixed_precision: Literal['none', 'static', 'dynamic'] = 'static'¶: If ‘none’, the whole training will be run in dtype. If ‘static’, forward and backward passes will be run in dtype, but the optimizer step will be run in full precision. If ‘dynamic’, forward and backward passes will be run with torch.amp in dtype, but the optimizer step will be run in full precision.

data_parallelism: Literal['ddp', 'fsdp'] = 'fsdp'¶: The data parallelism API to use.

fsdp_local_world_size: int | None = None¶: If not None, enables hybrid sharding. The model will be fully sharded within each worker group of size local_world_size and will be replicated across groups.

fsdp_wrap_granularity: Literal['layer', 'stack', 'model'] = 'layer'¶: The granularity at which to wrap the model.

fsdp_reshard_after_forward: bool = True¶: If True, reshards the parameters only after the backward pass.

tensor_parallel_size: int = 1¶: The size of tensor parallelism.

activation_checkpointing: bool = True¶: If True, uses layer-wise activation checkpointing.

torch_compile: bool = False¶: If True, applies torch.compile() to the decoder. (experimental)

optimizer: str = 'adamw'¶: The optimizer.

optimizer_config: Any¶: The configuration of the optimizer.

lr_scheduler: str = 'cosine-annealing'¶: The learning rate scheduler.

lr_scheduler_config: Any¶: The configuration of the learning rate scheduler.

gradient_accumulation: int = 1¶: The number of steps to accumulate gradients before an optimizer update.

max_gradient_norm: float | None = None¶: The maximum gradient norm. If None, no clipping will be applied.

fp16_loss_scale: tuple[float, float] = (128.0, 0.0001)¶: The initial and minimum loss scale for fp16 training.

max_num_steps: int = 5000¶: The maximum number of steps to train for. Note that max_num_steps is used as CosineLRScheduler argument!

max_num_data_epochs: int | None = None¶: The maximum number of data epochs to train for.

validate_after_n_steps: int = 0¶: The number of steps after which to start validating the model.

validate_every_n_steps: int = 100¶: The step interval at which to validate the model.

checkpoint_every_n_steps: int = 1000¶: The step interval at which to checkpoint.

checkpoint_every_n_data_epochs: int | None = None¶: The data epoch interval at which to checkpoint.

keep_last_n_checkpoints: int | None = 1¶: The number of checkpoints to keep. If None, none will be deleted.

keep_last_n_models: int | None = None¶: The number of checkpoint models to keep. If None, none will be deleted.

publish_metrics_every_n_steps: int = 10¶: The step interval at which to publish training metrics.

publish_metrics_every_n_data_epochs: int | None = None¶: The data epoch interval at which to publish training metrics.

resume_checkpoint_dir: Path | None = None¶: If not None, adds the specified path to the default asset store.

seed: int = 2¶: The random number generator seed to use.

profile: tuple[int, int] | None = None¶: The number of steps that the PyTorch profiler should skip and then record.

monitored_gang: bool = False¶: If True, puts a monitored barrier before every collective call.

anomaly_detection: bool = False¶: If True, turns on anomaly detection feature in torch.autograd.

wandb_project: str | None = None¶: If not None, sets the project name for W&B logging.

wandb_run_name: str | None = None¶: If not None, sets the run name for W&B logging. If None, then W&B creates a random name.

Functions¶

fairseq2.recipes.lm.instruction_finetune.load_instruction_finetuner(config, output_dir)[source]¶

Load a Trainer for language model instruction-finetuning.

Return type:: Trainer[SequenceBatch]