I have the full paths of images listed in two separate text files (each line contains a path) representing my training and validation sets. How do I load these images using ImageDataLoaders or DataBlock?
I would use the data block. You could first read the csv into a pandas df, and then pass the appropiate getters to the block.
So I’m doing this:
def readFiles(filename):
return L([line.rstrip('\n') for line in open(filename)])
def getAllImages(ignore):
trainImageFiles = readFiles(trainFile)
validImageFiles = readFiles(validFile)
return trainImageFiles + validImageFiles
validFileList = Path(validFile).read_text().split('\n')
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items = getAllImages,
get_y = label_func,
splitter = FuncSplitter(lambda x: x in validFileList))
dsets = dblock.datasets("")
which is somewhat a weird way of doing this since I already have the exact paths for the images of each set in two files. Ideally I’d like to pass them in ‘train’ and ‘valid’ parameters like in ImageDataLoaders.from_folder
, but instead it’s slightly more complicated and the files have to be combined, then a check has to be implemented in the splitter.
Also, I wanted to use the FileSplitter and simply specify my ‘validFile’ as parameter, but bizarrely the FileSplitter expects each line, i.e. string, of the text file to have a name
element, as if the string lines were Path objects.
Maybe I’m missing something.