Collater
- class fairseq2.data.Collater(pad_value=None, pad_to_multiple=1, overrides=None)[source]
Bases:
object
Concatenate a list of inputs into a single inputs.
Used to create batches. If all tensors in the input example have the same last dimension,
Collater
returns the concatenated tensors.Otherwise
pad_value
is required, and the last dimension of the batch will be made long enough to fit the longest tensor, rounded up topad_to_multiple
. The returned batch is then a dictionary with the following keys:{ "is_ragged": True/False # True if padding was needed "seqs": [[1, 4, 5, 0], [1, 2, 3, 4]] # "(Tensor) concatenated and padded tensors from the input "seq_lens": [3, 4] # A tensor describing the original length of each input tensor }
Collater preserves the shape of the original data. For a tuple of lists, it returns a tuple of batches. For a dict of lists, it returns a dict of lists.
- Parameters:
pad_value (Optional[int]) – When concatenating tensors of different lengths, the value used to pad the shortest tensor
pad_to_multiple (int) – Always pad to a length of that multiple.
overrides (Optional[Sequence[CollateOptionsOverride]]) – List of overrides
CollateOptionsOverride
. Allows to overridepad_value
andpad_to_multiple
for specific columns.