Package `audiocraft`

AudioCraft is a general framework for training audio generative models. At the moment we provide the training code for:

MusicGen, a state-of-the-art text-to-music and melody+text autoregressive generative model. For the solver, see MusicGenSolver, and for the model, MusicGen.
AudioGen, a state-of-the-art text-to-general-audio generative model.
EnCodec, efficient and high fidelity neural audio codec which provides an excellent tokenizer for autoregressive language models. See CompressionSolver, and EncodecModel.
MultiBandDiffusion, alternative diffusion-based decoder compatible with EnCodec that improves the perceived quality and reduces the artifacts coming from adversarial decoders.
JASCO Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.

Sub-modules

audiocraft.adversarial: Adversarial losses and discriminator architectures.
audiocraft.data: Audio loading and writing support. Datasets for raw audio or also including some metadata.
audiocraft.environment: Provides cluster and tools configuration across clusters (slurm, dora, utilities).
audiocraft.grids: Dora Grids.
audiocraft.losses: Loss related classes and functions. In particular the loss balancer from EnCodec, and the usual spectral losses.
audiocraft.metrics: Metrics like CLAP score, FAD, KLD, Visqol, Chroma similarity, etc.
audiocraft.models: Models for EnCodec, AudioGen, MusicGen, as well as the generic LMModel.
audiocraft.modules: Modules used for building the models.
audiocraft.optim: Optimization stuff. In particular, optimizers (DAdaptAdam), schedulers and Exponential Moving Average.
audiocraft.quantization: RVQ.
audiocraft.solvers: Solvers. A Solver is a training recipe, combining the dataloaders, models, optimizer, losses etc into a single convenient object.
audiocraft.train: Entry point for dora to launch solvers for running training loops. See more info on how to use dora: https://github.com/facebookresearch/dora
audiocraft.utils: Utilities.