In fairseq2, “assets” refer to the various components that make up a sequence or language modeling task, such as datasets, models, tokenizers, etc. These assets are essential for training, evaluating, and deploying models.
fairseq2.assets provides API to load the different models using the “model cards” from different “stores”.
To organize these assets, fairseq2 uses a concept called “cards,” which are essentially YAML files that describe the assets and their relationships.
For example, you can find all the “cards” in fairseq2 here.
Cards provide a flexible way to define and manage the various components of an NLP task, making it easier to reuse, share, and combine different assets.
A store is a place where all the model cards are stored. In fairseq2, a store is accessed via
fairseq2.assets.AssetStore. By default, fairseq2 will look up the following paths to
find asset cards:
System: Cards that are shared by all users. By default, the system store is /etc/fairseq2/assets,
but this can be changed via the environment variable FAIRSEQ2_ASSET_DIR.
User: Cards can be created with name with the suffix @user (e.g. llama3_2_1b@user) that are only available to the user.
By default, the user store is ~/.config/fairseq2/assets, but this can be changed via the environment variable FAIRSEQ2_USER_ASSET_DIR.
Here is an example on how to register a new directory to the a asset store:
A model card is a .YAML file that contains information about an asset such as
a model, dataset, or tokenizer. Each asset card must have a mandatory attribute
name. name will be used to identify the relevant asset, and it must be
unique across all fairseq2 provides example cards for different assets in
fairseq2.assets.cards.