VocabularyInfo
-
class fairseq2.data.VocabularyInfo(size, unk_idx, bos_idx, eos_idx, pad_idx)[source]
Bases: object
Describes the vocabulary used by a tokenizer
-
bos_idx: int | None
The index of the symbol that represents the beginning of a sequence (BOS).
-
eos_idx: int | None
The index of the symbol that represents the end of a sequence (EOS).
-
pad_idx: int | None
The index of the symbol that is used to pad a sequence (PAD).
-
size: int
The size of the vocabulary.
-
unk_idx: int | None
The index of the symbol that represents an unknown element (UNK).