SentencePieceEncoder

final class fairseq2.data.text.SentencePieceEncoder(model, prefix_tokens=None, suffix_tokens=None, reverse=False, enable_sampling=False, nbest_size=-1, alpha=0.1, device=None, pin_memory=False)[source]

Bases: TextTokenEncoder

__call__(text)[source]
Parameters:

text (str | CString) – The text to encode.

Return type:

Tensor

encode_as_tokens(text)[source]
Parameters:

text (str | CString) – The text to encode.

Return type:

List[str | CString]

property prefix_indices: Tensor | None

Get the indices of the prefix tokens. Shape: \((S)\), where \(S\) is the number of indices.

property suffix_indices: Tensor | None

Get the indices of the suffix tokens. Shape: \((S)\), where \(S\) is the number of indices.