How to define dataloader from train/valid folders and name_func of file?


I’m trying to train a model on data. The data is in train, valid, and test folders. in each folders there is a bunch of image files. The label is in the name of the file.

I succeeded in making a dataloader from one of the folders using ImageDataLoaders.from_name_func, but want to have the validation set come from the valid folder instead of being some pct of the train set.

Anyone knows how to approach this?
I need something similar to GrandparentSplitter I guess, but for the parent folder, not the grandparent.


I want to add that I tried hacking GrandparentSplitter for this, but didn’t get very far. Here is the code for GPSplitter:

def GrandparentSplitter(train_name='train', valid_name='valid'):
    "Split `items` from the grand parent folder names (`train_name` and `valid_name`)."
    def _inner(o, **kwargs):
        return _grandparent_idxs(o, train_name),_grandparent_idxs(o, valid_name)
    return _inner

and I can’t understand what it does, or where the _grandparent_idxs object is defined…

Apparently, Jupyter notebooks don’t show that function but I found it in the source code using pycharm. Here is the missing piece:

def _grandparent_idxs(items, name):
    def _inner(items, name): return mask2idxs(Path(o) == name for o in items)
    return [i for n in L(name) for i in _inner(items,n)]

So now I can probably make my ParentSplitter by removing one ‘parent’ call in the function above.

I don’t usually use ImageDataLoaders. I find Datablock api to be very easy and extremely versatile . In this case, you could do it with DataBlock api. See muellerzr vision course for examples in how to use ImageDataLoader, DataBlock and Dataset apis.