get_files in v2 stores the files in lists of path objects. While this may be convenient later on, would it also be possible to just store the full path as a string? That would reduce memory footprint by around 2/3, which is very relevant when using large datasets and multiple workers in the pytorch dataloaders (due to the multiprocess forking and copy-on-access problems). My anaylsis from a while ago:
The core problem came from this thread, which many fastaiers had trouble with: