Skip to main content

Prebuilt Pipelines

Pre-built pipelines.

📄️ Expressive parallel alignments

We extend the parallel speech alignment method described in Seamless Communication et al. (2023) to align pairs not only in terms of meaning, but also in terms of expressivity. Specifically, the pipeline is extended to allow for additional processing of the k-nearest-neighbors and introduces an option to add auxiliary_embeddings, which are expected to be a complementary source of prosodic input to the traditional semantic-based inputs e.g. SONAR speech embeddings. The prosodic scores/similarity from the auxiliary embeddings are then "blended" together with the traditional margin-based scores using the formula:

📄️ Speech Mining Pipeline

With the Seamless Communication project, FAIR has introduced a new mechanism for speech mining. In the stopesV1, you could mine large text datasets to create aligned text accross languages. This was useful to train machine translation algorithms. From stopesV2 onwards, we introduce a mechanism that lets you mine speech and text together accross languages to create aligned multimodal datasets for training and evaluating speech tasks. This mining is based on the SONAR multimodal/multilingual embedding space.