How to define dataloader from train/valid folders and name_func of file?

yonatan365 · April 19, 2020, 10:01am

Hi,

I’m trying to train a model on data. The data is in train, valid, and test folders. in each folders there is a bunch of image files. The label is in the name of the file.

I succeeded in making a dataloader from one of the folders using ImageDataLoaders.from_name_func, but want to have the validation set come from the valid folder instead of being some pct of the train set.

Anyone knows how to approach this?
I need something similar to GrandparentSplitter I guess, but for the parent folder, not the grandparent.

Thanks!

yonatan365 · April 19, 2020, 10:19am

I want to add that I tried hacking GrandparentSplitter for this, but didn’t get very far. Here is the code for GPSplitter:

def GrandparentSplitter(train_name='train', valid_name='valid'):
    "Split `items` from the grand parent folder names (`train_name` and `valid_name`)."
    def _inner(o, **kwargs):
        return _grandparent_idxs(o, train_name),_grandparent_idxs(o, valid_name)
    return _inner

and I can’t understand what it does, or where the _grandparent_idxs object is defined…

yonatan365 · April 19, 2020, 1:28pm

Apparently, Jupyter notebooks don’t show that function but I found it in the source code using pycharm. Here is the missing piece:

def _grandparent_idxs(items, name):
    def _inner(items, name): return mask2idxs(Path(o).parent.parent.name == name for o in items)
    return [i for n in L(name) for i in _inner(items,n)]

So now I can probably make my ParentSplitter by removing one ‘parent’ call in the function above.

vferrer · April 19, 2020, 4:17pm

I don’t usually use ImageDataLoaders. I find Datablock api to be very easy and extremely versatile . In this case, you could do it with DataBlock api. See muellerzr vision course for examples in how to use ImageDataLoader, DataBlock and Dataset apis.

DLegor · April 22, 2022, 9:02am

Hey @yonatan365, I hope you solved your problem, one option is to use DataBlock api and function Func Splitter, something like this:

your_dls = DataBlock(blocks = (ImageBlock, CategoryBlock),
                   get_items = partial(get_image_files,folders=['Train','Validation']),
                   get_y     = Yout_Func_Here,
                   splitter  = FuncSplitter(lambda s: Path(s).parent.name == 'Validation'),
                   item_tfms=Resize(256),
                   batch_tfms=aug_transforms())

An option to test it works well, before creating your dataloaders you can run something like this:

Data_Test=get_image_files(your_folder_path,folders=['Train','Validation'])
fun_test=FuncSplitter(lambda s: Path(s).parent.name == 'Validation')
fun_test(Data_Test)
~~