How to load image files whose paths are in text files

Tamori · October 27, 2020, 10:09am

I have the full paths of images listed in two separate text files (each line contains a path) representing my training and validation sets. How do I load these images using ImageDataLoaders or DataBlock?

tcapelle · October 27, 2020, 11:26am

I would use the data block. You could first read the csv into a pandas df, and then pass the appropiate getters to the block.

Tamori · October 28, 2020, 8:35am

So I’m doing this:

def readFiles(filename):
    return L([line.rstrip('\n') for line in open(filename)])

def getAllImages(ignore):
    trainImageFiles = readFiles(trainFile)
    validImageFiles = readFiles(validFile)
    return trainImageFiles + validImageFiles

validFileList = Path(validFile).read_text().split('\n')

dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
                   get_items = getAllImages,
                   get_y = label_func,
                   splitter = FuncSplitter(lambda x: x in validFileList))

dsets = dblock.datasets("")

which is somewhat a weird way of doing this since I already have the exact paths for the images of each set in two files. Ideally I’d like to pass them in ‘train’ and ‘valid’ parameters like in ImageDataLoaders.from_folder, but instead it’s slightly more complicated and the files have to be combined, then a check has to be implemented in the splitter.

Also, I wanted to use the FileSplitter and simply specify my ‘validFile’ as parameter, but bizarrely the FileSplitter expects each line, i.e. string, of the text file to have a name element, as if the string lines were Path objects.

Maybe I’m missing something.