fairseq2.data.textΒΆ

This module contains text tokenizers and text specific data pipeline operators.

        classDiagram
  ABC <|-- TextTokenDecoder
  ABC <|-- TextTokenEncoder
  ABC <|-- TextTokenizer
  ABC <|-- TextTokenizerHandler
  AbstractTextTokenizer <|-- SentencePieceTokenizer
  AbstractTextTokenizer <|-- TiktokenTokenizer
  Enum <|-- LineEnding
  Generic <|-- Protocol
  Protocol <|-- TextTokenizerLoader
  SentencePieceTokenizer <|-- BasicSentencePieceTokenizer
  SentencePieceTokenizer <|-- NllbTokenizer
  SentencePieceTokenizer <|-- RawSentencePieceTokenizer
  SentencePieceTokenizer <|-- S2TTransformerTokenizer
  TextTokenDecoder <|-- SentencePieceDecoder
  TextTokenDecoder <|-- TiktokenDecoder
  TextTokenEncoder <|-- SentencePieceEncoder
  TextTokenEncoder <|-- TiktokenEncoder
  TextTokenizer <|-- AbstractTextTokenizer
  TextTokenizerHandler <|-- StandardTextTokenizerHandler
  TiktokenTokenizer <|-- LLaMA3Tokenizer