Explicitly specify train or valid with from_df() instead of valid_pct

Typically when I have created a databunch using from_df() I use the valid_pct to specify what percentage to split into train vs valid. Unfortunately I can’t do this because I don’t want to split it randomly. I have groups of similar images and I don’t want to split a group between train and valid. I want to keep a group completely in train or valid.

I would also prefer to use a df because I don’t want to move the files into separate directories.

I imagine I can inherit from_df and figure out a way to do this but wanted to check if anyone has done anything like this already?

1 Like

Create two image lists (from df) and then databunch and set the valid dataloader to your valid imagelist :slight_smile:

1 Like

researching now but looks like this may already exist:
https://docs.fast.ai/data_block.html#ItemList.split_from_df

1 Like

Good find :slight_smile: I had forgotten about that