Collater

class fairseq2.data.Collater(pad_value=None, pad_to_multiple=1, overrides=None)[source]

Bases: object

Concatenate a list of inputs into a single inputs.

Used to create batches. If all tensors in the input example have the same last dimension, Collater returns the concatenated tensors.

Otherwise pad_value is required, and the last dimension of the batch will be made long enough to fit the longest tensor, rounded up to pad_to_multiple. The returned batch is then a dictionary with the following keys:

{
    "is_ragged": True/False # True if padding was needed
    "seqs": [[1, 4, 5, 0], [1, 2, 3, 4]]  # "(Tensor) concatenated and padded tensors from the input
    "seq_lens": [3, 4]  # A tensor describing the original length of each input tensor
}

Collater preserves the shape of the original data. For a tuple of lists, it returns a tuple of batches. For a dict of lists, it returns a dict of lists.

Parameters:
  • pad_value (Optional[int]) – When concatenating tensors of different lengths, the value used to pad the shortest tensor

  • pad_to_multiple (int) – Always pad to a length of that multiple.

  • overrides (Optional[Sequence[CollateOptionsOverride]]) – List of overrides CollateOptionsOverride. Allows to override pad_value and pad_to_multiple for specific columns.

__call__(data)[source]

Concatenate the input tensors

Return type:

Any