StrSplitter

class fairseq2.data.text.StrSplitter(sep='\t', names=None, indices=None, exclude=False)[source]

Bases: object

Split string on a given character.

Parameters:
  • sep (str) – The character to split on (default to tab)

  • names (Sequence[str] | None) – names of the corresponding columns of the input tsv file Will create dictionaries object with one entry per column

  • indices (Sequence[int] | None) – The indices of the column to keep.

Example usage:

# read all columns: ["Go.", "Va !", "CC-BY 2.0 (France)"]
dataloader = read_text("tatoeba.tsv").map(StrSplitter()).and_return()
# keep only the second column and convert to string: "Va !"
dataloader = read_text("tatoeba.tsv").map(StrSplitter(indices=[1])).map(lambda x: x[0]).and_return()
# keep only the first and second column and convert to dict: {"en": "Go.", "fr": "Va !"}
dataloader = read_text("tatoeba.tsv").map(StrSplitter(names=["en", "fr"], indices=[0, 1])).and_return()
__call__(s)[source]

Call self as a function.

Return type:

List[str | CString] | Dict[str, str | CString]