Efficient Benchmarking¶

Overview¶

This tutorial will guide you through conducting systematic benchmarks using fairseq2. We’ll focus on practical examples using language models, covering:

Training speed benchmarks
Multi-node scaling efficiency
Hyperparameter sweeps
Performance profiling

Note

The examples will use LLaMA models, but the concepts apply to any model architecture.

Training Speed Benchmarks¶

Let’s start by benchmarking the training speed of different model configurations.

1. Environment Setup¶

First, set up different virtual environments to test various PyTorch configurations.

Note

Follow the instructions in Installation to install fairseq2 and PyTorch.

2. Multi-Node Training¶

fairseq2 CLI is designed to support distributed training across multiple nodes, and it facilitates the sweeping of hyperparameters across different environments.

Hyperparameter Sweeps¶

fairseq2 provides powerful sweep functionality with its fairseq2.recipes.utils.sweep_tagger.SweepTagger. It helps ensure:

Consistent directory structure across nodes
Reproducible experiments
Easy comparison of different configurations

For example, when running multi-node training:

#!/bin/bash
#SBATCH --job-name=mt_sweep
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8

# Language pairs to sweep
lang_pairs=(
    "eng-fra"
    "eng-deu"
    "eng-spa"
)

# Run MT sweeps
for pair in "${lang_pairs[@]}"; do
    src_lang=${pair%-*}
    tgt_lang=${pair#*-}

    # fairseq2 CLI will automatically use SweepTagger to create
    # a unique directory based on the config
    srun fairseq2 mt train \
        --preset nllb_600m \
        --config-file configs/mt.yaml \
        --config source_lang=$src_lang target_lang=$tgt_lang \
        -- sweep_outputs/  # Base output directory

The fairseq2 CLI will:

Parse the config file and command line overrides
Use fairseq2.recipes.utils.sweep_tagger.SweepTagger to generate a unique tag based on sweep keys
Create a subdirectory using this tag under the base output directory
Ensure all nodes write to the same directory structure
If fmt is provided, it will be used to generate the tag in a customizable format

Note

Use --no-sweep-dir when you want to disable automatic sweep directory creation. This is useful when:

Running quick tests/debugging
Using custom directory structures

Different recipes support different sweep keys. The following examples will show how to configure sweep tags for different recipes.

1. Language Model Sweeps¶

For language models, we have two main finetuning approaches.

2. Machine Translation Sweeps¶

MT recipes include additional sweep keys specific to translation tasks.

3. wav2vec2 Sweeps¶

Speech models also have their own set of sweep parameters:

Performance Profiling¶

fairseq2 uses PyTorch’s profiler to help analyze performance bottlenecks. The profiler results will be saved to TensorBoard format in the output directory. It allows you to visualize the performance of your model in detail. It is also a useful tool for gathering performance metrics for hyperparameter sweeps.

Best Practices¶

Systematic Benchmarking
- Always benchmark with fixed seeds for reproducibility
- Test multiple batch sizes and sequence lengths
- Measure both training and validation performance
- Record memory usage and throughput metrics
Distributed Training
- Start with single-node tests before scaling to multiple nodes
- Monitor communication overhead between nodes
- Use FSDP for large models that don’t fit in GPU memory
- Experiment with different tensor parallel sizes
Performance Optimization
- Enable mixed precision training when possible
- Tune gradient accumulation steps
- Profile to identify bottlenecks
- Monitor GPU utilization and memory usage

Efficient Benchmarking¶

Overview¶

Training Speed Benchmarks¶

1. Environment Setup¶

2. Multi-Node Training¶

Hyperparameter Sweeps¶

1. Language Model Sweeps¶

2. Machine Translation Sweeps¶

3. wav2vec2 Sweeps¶

Performance Profiling¶

Best Practices¶

See Also¶