fairseq2.data.textΒΆ
This module contains text tokenizers and text specific data pipeline operators.
classDiagram ABC <|-- TextTokenDecoder ABC <|-- TextTokenEncoder ABC <|-- TextTokenizer ABC <|-- TextTokenizerHandler AbstractTextTokenizer <|-- SentencePieceTokenizer AbstractTextTokenizer <|-- TiktokenTokenizer Enum <|-- LineEnding Generic <|-- Protocol Protocol <|-- TextTokenizerLoader SentencePieceTokenizer <|-- BasicSentencePieceTokenizer SentencePieceTokenizer <|-- NllbTokenizer SentencePieceTokenizer <|-- RawSentencePieceTokenizer SentencePieceTokenizer <|-- S2TTransformerTokenizer TextTokenDecoder <|-- SentencePieceDecoder TextTokenDecoder <|-- TiktokenDecoder TextTokenEncoder <|-- SentencePieceEncoder TextTokenEncoder <|-- TiktokenEncoder TextTokenizer <|-- AbstractTextTokenizer TextTokenizerHandler <|-- StandardTextTokenizerHandler TiktokenTokenizer <|-- LLaMA3Tokenizer