FileMapper
- class fairseq2.data.FileMapper(root_dir=None, cached_fd_count=None)[source]
Bases:
object
For a given file name, returns the file content as bytes.
The file name can also specify a slice of the file in bytes:
FileMapper("big_file.txt:1024:48")
will read 48 bytes at offset 1024.- Parameters:
root_dir (Optional[PathLike]) – Root directory for looking up relative file names. Warning, this is not enforced, FileMapper will happily read any file on the system.
cached_fd_count (Optional[int]) – Enables an LRU cache on the last
cached_fd_count
files read.FileMapper
will memory map all the cached file, so this is especially useful for reading several slices of the same file.
- __call__(filename)[source]
Parses the file name and returns the file bytes.
- Returns:
A dict with the following keys:
{ "path": "the/path.txt" # the relative path of the file "data": MemoryBlock # a memory block with the content of the file. You can use `bytes` to get a regular python object. }
- Return type:
FileMapperOutput