Explicitly specify train or valid with from_df() instead of valid_pct

source99 · September 13, 2019, 10:50pm

Typically when I have created a databunch using from_df() I use the valid_pct to specify what percentage to split into train vs valid. Unfortunately I can’t do this because I don’t want to split it randomly. I have groups of similar images and I don’t want to split a group between train and valid. I want to keep a group completely in train or valid.

I would also prefer to use a df because I don’t want to move the files into separate directories.

I imagine I can inherit from_df and figure out a way to do this but wanted to check if anyone has done anything like this already?

muellerzr · September 13, 2019, 11:14pm

Create two image lists (from df) and then databunch and set the valid dataloader to your valid imagelist

source99 · September 13, 2019, 11:46pm

researching now but looks like this may already exist:
https://docs.fast.ai/data_block.html#ItemList.split_from_df

muellerzr · September 13, 2019, 11:55pm

Good find I had forgotten about that