Package audiocraft

AudioCraft is a general framework for training audio generative models. At the moment we provide the training code for:

  • MusicGen, a state-of-the-art text-to-music and melody+text autoregressive generative model. For the solver, see MusicGenSolver, and for the model, MusicGen.
  • AudioGen, a state-of-the-art text-to-general-audio generative model.
  • EnCodec, efficient and high fidelity neural audio codec which provides an excellent tokenizer for autoregressive language models. See CompressionSolver, and EncodecModel.
  • MultiBandDiffusion, alternative diffusion-based decoder compatible with EnCodec that improves the perceived quality and reduces the artifacts coming from adversarial decoders.
Expand source code
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""
AudioCraft is a general framework for training audio generative models.
At the moment we provide the training code for:

- [MusicGen](https://arxiv.org/abs/2306.05284), a state-of-the-art
    text-to-music and melody+text autoregressive generative model.
    For the solver, see `audiocraft.solvers.musicgen.MusicGenSolver`, and for the model,
    `audiocraft.models.musicgen.MusicGen`.
- [AudioGen](https://arxiv.org/abs/2209.15352), a state-of-the-art
    text-to-general-audio generative model.
- [EnCodec](https://arxiv.org/abs/2210.13438), efficient and high fidelity
    neural audio codec which provides an excellent tokenizer for autoregressive language models.
    See `audiocraft.solvers.compression.CompressionSolver`, and `audiocraft.models.encodec.EncodecModel`.
- [MultiBandDiffusion](TODO), alternative diffusion-based decoder compatible with EnCodec that
    improves the perceived quality and reduces the artifacts coming from adversarial decoders.
"""

# flake8: noqa
from . import data, modules, models

__version__ = '1.3.0'

Sub-modules

audiocraft.adversarial

Adversarial losses and discriminator architectures.

audiocraft.data

Audio loading and writing support. Datasets for raw audio or also including some metadata.

audiocraft.environment

Provides cluster and tools configuration across clusters (slurm, dora, utilities).

audiocraft.grids

Dora Grids.

audiocraft.losses

Loss related classes and functions. In particular the loss balancer from EnCodec, and the usual spectral losses.

audiocraft.metrics

Metrics like CLAP score, FAD, KLD, Visqol, Chroma similarity, etc.

audiocraft.models

Models for EnCodec, AudioGen, MusicGen, as well as the generic LMModel.

audiocraft.modules

Modules used for building the models.

audiocraft.optim

Optimization stuff. In particular, optimizers (DAdaptAdam), schedulers and Exponential Moving Average.

audiocraft.quantization

RVQ.

audiocraft.solvers

Solvers. A Solver is a training recipe, combining the dataloaders, models, optimizer, losses etc into a single convenient object.

audiocraft.train

Entry point for dora to launch solvers for running training loops. See more info on how to use dora: https://github.com/facebookresearch/dora

audiocraft.utils

Utilities.