fairseq2.recipes.lm.instruction_finetune

        classDiagram
  ABC <|-- CliCommandHandler
  CliCommandHandler <|-- ChatbotCommandHandler
  CliCommandHandler <|-- RecipeCommandHandler
  Generic <|-- RecipeCommandHandler
    

Classes

class fairseq2.recipes.lm.instruction_finetune.InstructionFinetuneConfig(*, dataset='foo', train_split='default', valid_split=None, max_seq_len=8192, max_num_tokens=16384, batch_size=None, max_num_valid_tokens=None, example_shuffle_window=10000, batch_shuffle_window=1000, num_prefetch=4, src_encode_mode='prompt', tgt_encode_mode='prompt_response', model='llama3_1_8b_instruct', model_config=None, dtype=torch.bfloat16, mixed_precision='static', data_parallelism='fsdp', fsdp_local_world_size=None, fsdp_wrap_granularity='layer', fsdp_reshard_after_forward=True, tensor_parallel_size=1, activation_checkpointing=True, torch_compile=False, optimizer='adamw', optimizer_config=<factory>, lr_scheduler='cosine-annealing', lr_scheduler_config=<factory>, gradient_accumulation=1, max_gradient_norm=None, fp16_loss_scale=(128.0, 0.0001), max_num_steps=5000, max_num_data_epochs=None, validate_after_n_steps=0, validate_every_n_steps=100, checkpoint_every_n_steps=1000, checkpoint_every_n_data_epochs=None, keep_last_n_checkpoints=1, keep_last_n_models=None, publish_metrics_every_n_steps=10, publish_metrics_every_n_data_epochs=None, resume_checkpoint_dir=None, seed=2, profile=None, monitored_gang=False, anomaly_detection=False, wandb_project=None, wandb_run_name=None)[source]

Bases: object

Holds the configuration of a language model instruction-finetuning task.

dataset: str | AssetCard | Path = 'foo'

The name, path, or path to the asset card of the instruction dataset.

train_split: str = 'default'

The name of the train data split.

valid_split: str | None = None

The name of the valid data split.

max_seq_len: int = 8192

The maximum sequence length.

max_num_tokens: int = 16384

The maximum number of tokens per batch.

batch_size: int | None = None

If not None, ignores max_num_tokens and each batch will have batch_size examples.

max_num_valid_tokens: int | None = None

The maximum number of tokens per validation batch.

example_shuffle_window: int = 10000

The size of the sliding window for shuffling examples.

batch_shuffle_window: int = 1000

The size of the sliding window for shuffling batches.

num_prefetch: int = 4

The number of batches to prefetch in background.

src_encode_mode: str = 'prompt'

The encode mode for the prompt, determines what special tokens to add.

tgt_encode_mode: str = 'prompt_response'

The encode mode for the target, determines what special tokens to add.

model: str | AssetCard | Path = 'llama3_1_8b_instruct'

The name or path to the asset card of the language model to finetune.

model_config: Any = None

The model configuration overrides. The provided values must be compatible with the checkpoint; otherwise, the model will fail to load.

dtype: dtype = torch.bfloat16

The data type of the model.

mixed_precision: Literal['none', 'static', 'dynamic'] = 'static'

If ‘none’, the whole training will be run in dtype. If ‘static’, forward and backward passes will be run in dtype, but the optimizer step will be run in full precision. If ‘dynamic’, forward and backward passes will be run with torch.amp in dtype, but the optimizer step will be run in full precision.

data_parallelism: Literal['ddp', 'fsdp'] = 'fsdp'

The data parallelism API to use.

fsdp_local_world_size: int | None = None

If not None, enables hybrid sharding. The model will be fully sharded within each worker group of size local_world_size and will be replicated across groups.

fsdp_wrap_granularity: Literal['layer', 'stack', 'model'] = 'layer'

The granularity at which to wrap the model.

fsdp_reshard_after_forward: bool = True

If True, reshards the parameters only after the backward pass.

tensor_parallel_size: int = 1

The size of tensor parallelism.

activation_checkpointing: bool = True

If True, uses layer-wise activation checkpointing.

torch_compile: bool = False

If True, applies torch.compile() to the decoder. (experimental)

optimizer: str = 'adamw'

The optimizer.

optimizer_config: Any

The configuration of the optimizer.

lr_scheduler: str = 'cosine-annealing'

The learning rate scheduler.

lr_scheduler_config: Any

The configuration of the learning rate scheduler.

gradient_accumulation: int = 1

The number of steps to accumulate gradients before an optimizer update.

max_gradient_norm: float | None = None

The maximum gradient norm. If None, no clipping will be applied.

fp16_loss_scale: tuple[float, float] = (128.0, 0.0001)

The initial and minimum loss scale for fp16 training.

max_num_steps: int = 5000

The maximum number of steps to train for. Note that max_num_steps is used as CosineLRScheduler argument!

max_num_data_epochs: int | None = None

The maximum number of data epochs to train for.

validate_after_n_steps: int = 0

The number of steps after which to start validating the model.

validate_every_n_steps: int = 100

The step interval at which to validate the model.

checkpoint_every_n_steps: int = 1000

The step interval at which to checkpoint.

checkpoint_every_n_data_epochs: int | None = None

The data epoch interval at which to checkpoint.

keep_last_n_checkpoints: int | None = 1

The number of checkpoints to keep. If None, none will be deleted.

keep_last_n_models: int | None = None

The number of checkpoint models to keep. If None, none will be deleted.

publish_metrics_every_n_steps: int = 10

The step interval at which to publish training metrics.

publish_metrics_every_n_data_epochs: int | None = None

The data epoch interval at which to publish training metrics.

resume_checkpoint_dir: Path | None = None

If not None, adds the specified path to the default asset store.

seed: int = 2

The random number generator seed to use.

profile: tuple[int, int] | None = None

The number of steps that the PyTorch profiler should skip and then record.

monitored_gang: bool = False

If True, puts a monitored barrier before every collective call.

anomaly_detection: bool = False

If True, turns on anomaly detection feature in torch.autograd.

wandb_project: str | None = None

If not None, sets the project name for W&B logging.

wandb_run_name: str | None = None

If not None, sets the run name for W&B logging. If None, then W&B creates a random name.

Functions

fairseq2.recipes.lm.instruction_finetune.load_instruction_finetuner(config, output_dir)[source]

Load a Trainer for language model instruction-finetuning.

Return type:

Trainer[SequenceBatch]