fairseq2.datasets

The datasets module provides pre-built datasets and dataset utilities for common NLP and speech tasks.

Coming soon: This documentation is being developed. The datasets module includes:

  • Common benchmark datasets

  • Dataset loading utilities

  • Data preprocessing pipelines

Please refer to the source code and examples in the meantime.