VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation and video description. It takes multi-modal sources, text references as well as text predictions as inputs, and analyzes them visually in Jupyter Notebook or a built-in Web App (the former has Fairseq integration). VizSeq also provides a collection of multi-process scorers as a normal Python package.
VizSeq accepts various source types, including text, image, audio, video or any combination of them. This covers a wide range of text generation tasks, examples of which are listed below:
|Text||Machine translation, text summarization, dialog generation, grammatical error correction, open-domain question answering|
|Image||Image captioning, image question answering, optical character recognition|
|Audio||Speech recognition, speech translation|
|Multimodal||Multimodal machine translation|
Accelerated with multi-processing/multi-threading.
|N-gram-based||BLEU (Papineni et al., 2002), NIST (Doddington, 2002), METEOR (Banerjee et al., 2005), TER (Snover et al., 2006), RIBES (Isozaki et al., 2010), chrF (Popović et al., 2015), GLEU (Wu et al., 2016), ROUGE (Lin, 2004), CIDEr (Vedantam et al., 2015), WER|
|Embedding-based||LASER (Artetxe and Schwenk, 2018), BERTScore (Zhang et al., 2019)|
VizSeq is licensed under MIT.