What’s New in v0.5¶

fairseq2 v0.5 represents a major milestone with significant improvements across the entire toolkit. This release focuses on simplifying user onboarding, enhancing performance, and expanding integration capabilities.

🚀 Recipe Authoring & User Experience¶

Simplified Recipe APIs: Significant overhaul and simplification of recipe authoring APIs to ease onboarding to fairseq2 and make fairseq2 features more discoverable to new users. The new APIs provide a more intuitive and streamlined experience for both beginners and advanced users.
Enhanced Discoverability: fairseq2 features are now more discoverable, with improved documentation, better API organization, and clearer examples to help users get started quickly.

🤖 New Model Support¶

Qwen 2.5 & Qwen 3 Models: Parity-checked implementations of dense Qwen 2.5 and Qwen 3 models have been added, expanding the range of state-of-the-art language models available in fairseq2.
New Language Model Pretraining Recipe: Implementation of a new language model pretraining recipe, parity-checked by a LLaMA-3 8B - 1T DCLM Baseline run. This provides a robust foundation for training large language models from scratch.

🔗 Hugging Face Integration¶

Native Checkpoint Support: Native support for reading and writing Hugging Face checkpoints for models that implement necessary integration APIs, enabling easier leverage of the Hugging Face ecosystem during async evaluation jobs (e.g., vLLM) within fairseq2 training.
Hugging Face Tokenizers: Native support for Hugging Face tokenizers across all relevant fairseq2 APIs, providing seamless integration with the broader NLP ecosystem.
Asset Cards with HF Hub: Asset Cards now support the “hf://” scheme to download models, tokenizers, and datasets directly from the Hugging Face Hub, making it easier to work with community models and datasets.

📊 Data Processing & Batching¶

Unified Batching APIs: Consolidation of padded and packed batching APIs under a single Batch Layout type. All fairseq2.nn modules are updated to handle both modes consistently, simplifying data processing workflows.
High-Performance Data Pipeline: A new high-performance C++-based data pipeline packing operation specifically designed for language model pretraining jobs, significantly improving data throughput and training efficiency.

⚡ Performance & Memory Optimizations¶

Flash3 Attention Support: Support for (varlen) flash3 attention, with torch.compile integration, providing state-of-the-art attention performance and memory efficiency.
Torch.compile Integration: Activation memory budget setting of torch.compile’s min-cut partitioner is now exposed in all first-party recipes, giving users fine-grained control over memory usage during compilation.

💾 Advanced Checkpoint Management¶

New Checkpoint Format: A new checkpoint format serves as a lightweight alternative to PyTorch DCP, offering similar dynamic resharding capabilities. No need to set up process groups for checkpoint saving or loading, thanks to integration with the new 3-D model sharding APIs.
User-Inspectable Checkpoints: Generated checkpoints are regular, user-inspectable PyTorch tensor files (i.e., “.pt”) for easier troubleshooting and analysis.
Asynchronous Checkpoint Manager: A new asynchronous checkpoint manager is tightly integrated with the new format. It is fully deterministic and includes special handling of NFS lookup caches to prevent race conditions in async evaluation jobs.
Model-Only Checkpoints: Ability to save only models instead of entire checkpoints during training, especially helpful for short-running post-training jobs to reduce disk overhead.

⚙️ Advanced Model Sharding¶

3-D Model Sharding API: A new, extensible 3-D model sharding API supported both in offline (checkpoint) and online (training) settings, enabling more flexible and efficient distributed training configurations.

📈 Metrics & Monitoring¶

Revised Metric API: The metric API has been revised for greater flexibility and no longer requires individual MetricBag subclasses for use with recipe units, simplifying custom metric implementation.

🔧 Architecture & Maintainability¶

Dependency Injection Framework: Many internal APIs revised to use a new, lightweight yet full-featured dependency injection framework for better testability and maintainability.
API Improvements: Numerous improvements and extensions to various internal APIs, enhancing overall code quality and developer experience.

🎯 Migration Guide¶

When upgrading to v0.5, users should be aware of the following key changes:

Recipe APIs: Recipe authoring APIs have been significantly simplified. Existing recipes may need updates to work with the new APIs. Check the updated documentation and examples for migration guidance.
BatchLayout Changes: The consolidation of batching APIs under BatchLayout may require updates to custom data processing code. The new unified API provides better consistency and performance.
Checkpoint Format: While the new checkpoint format offers significant advantages, existing checkpoints will need to be converted or retrained. The new format provides better performance and easier troubleshooting capabilities.

Getting Started¶

For detailed information on using these new features, please refer to the API Reference and updated tutorials.

What’s Next¶

The fairseq2 team continues to work on:

Enhanced model support and integrations
Further performance optimizations
Expanded tutorial and documentation coverage
Community-driven feature requests

Stay tuned for future releases and improvements!