Skip to main content

Getting Started

Dataset

To download the dataset please proceed to https://github.com/facebookresearch/fairseq/tree/main/examples/audio_nlp/nlu to find the download link.

The low resource splits can be downloaded from this link: http://dl.fbaipublicfiles.com/stop/low_resource_splits.tar.gz

The downloaded data contains both natural speech (stop) and tts (stop_tts). Each contains full resource and low-resource train and validation splits, as well as test splits. Each utterance in the validation and test sets have two recordings, with different speakers (eval_0 and eval_1, test_0 and test_1). The manifests reference audio files across these directories for each dataset split. Each manifest is composed of three files: a .tsv file referencing the audio files, a .ltr file with utterance text, and a .parse file with the corresponding semantic parses for the utterances (labels).

In addition to the manifests, to use fairseq you will need to generate dictionaries for both the source and target text. Do so by running the following command from a fairseq repo:

./examples/audio_nlp/nlu/create_dict_stop.sh $FAIRSEQ_DATASET_OUTPUT

Here $FAIRSEQ_DATASET_OUTPUT should point to the location of the .ltr and .parse files.

Baselines

At our github we provide references to several pretrained models as well as their result performance on the test set as a starting point for evaluation.

Running Experiments

In order to run experiments, we provide an example fine-tuning configuration: https://github.com/facebookresearch/fairseq/blob/main/examples/audio_nlp/nlu/configs/nlu_finetuning.yaml to reproduce our results.

Example command:

python fairseq_cli/hydra-train  --config-dir examples/audio_nlp/nlu/configs/  --config-name nlu_finetuning task.data=$FAIRSEQ_DATA_OUTPUT model.w2v_path=$PRETRAINED_MODEL_PATH