FileMapper

class fairseq2.data.FileMapper(root_dir=None, cached_fd_count=None)[source]

Bases: object

For a given file name, returns the file content as bytes.

The file name can also specify a slice of the file in bytes: FileMapper("big_file.txt:1024:48") will read 48 bytes at offset 1024.

Parameters:
  • root_dir (Optional[PathLike]) – Root directory for looking up relative file names. Warning, this is not enforced, FileMapper will happily read any file on the system.

  • cached_fd_count (Optional[int]) – Enables an LRU cache on the last cached_fd_count files read. FileMapper will memory map all the cached file, so this is especially useful for reading several slices of the same file.

__call__(filename)[source]

Parses the file name and returns the file bytes.

Returns:

A dict with the following keys:

{
    "path": "the/path.txt" # the relative path of the file
    "data": MemoryBlock  # a memory block with the content of the file. You can use `bytes` to get a regular python object.
}

Return type:

FileMapperOutput