TextTokenEncoder

class fairseq2.data.text.TextTokenEncoder[source]

Bases: ABC

Encodes text into tokens or token indices.

abstract __call__(text)[source]
Parameters:

text (str | CString) – The text to encode.

Return type:

Tensor

abstract encode_as_tokens(text)[source]
Parameters:

text (str | CString) – The text to encode.

Return type:

List[str | CString]

abstract property prefix_indices: Tensor | None

Get the indices of the prefix tokens. Shape: \((S)\), where \(S\) is the number of indices.

abstract property suffix_indices: Tensor | None

Get the indices of the suffix tokens. Shape: \((S)\), where \(S\) is the number of indices.