TextTokenEncoder
-
class fairseq2.data.text.TextTokenEncoder[source]
Bases: ABC
Encodes text into tokens or token indices.
-
abstract __call__(text)[source]
- Parameters:
text (str | CString) – The text to encode.
- Return type:
Tensor
-
abstract encode_as_tokens(text)[source]
- Parameters:
text (str | CString) – The text to encode.
- Return type:
List[str | CString]
-
abstract property prefix_indices: Tensor | None
Get the indices of the prefix tokens. Shape: \((S)\), where
\(S\) is the number of indices.
-
abstract property suffix_indices: Tensor | None
Get the indices of the suffix tokens. Shape: \((S)\), where
\(S\) is the number of indices.