📄️ Expressive parallel alignments
We extend the parallel speech alignment method described in Seamless Communication et al. (2023) to align pairs not only in terms of meaning, but also in terms of expressivity. Specifically, the pipeline is extended to allow for additional processing of the k-nearest-neighbors and introduces an option to add auxiliary_embeddings, which are expected to be a complementary source of prosodic input to the traditional semantic-based inputs e.g. SONAR speech embeddings. The prosodic scores/similarity from the auxiliary embeddings are then "blended" together with the traditional margin-based scores using the formula:
📄️ Speech Mining Pipeline
With the Seamless Communication project, FAIR has introduced a new mechanism for speech mining. In the stopesV1, you could mine large text datasets to create aligned text accross languages. This was useful to train machine translation algorithms. From stopesV2 onwards, we introduce a mechanism that lets you mine speech and text together accross languages to create aligned multimodal datasets for training and evaluating speech tasks. This mining is based on the SONAR multimodal/multilingual embedding space.
📄️ Global Mining Pipeline
You can launch the mining for a pair of languages with the following command:
📄️ NLLB Monolingual Pipeline
This is the monolingual "cleaning" pipeline, it does a few things:
📄️ NLLB Distillation Pipeline
Welcome to stopes, and thanks for checking out our sequence-level knowledge distillation pipeline. This is a quick start guide which walks through how to run the pipeline yourself and what the expected outputs will be from each step. The logic of the pipeline is at a high level as follows: