Explicitly specify train or valid with from_df() instead of valid_pct

Typically when I have created a databunch using from_df() I use the valid_pct to specify what percentage to split into train vs valid. Unfortunately I can’t do this because I don’t want to split it randomly. I have groups of similar images and I don’t want to split a group between train and valid. I want to keep a group completely in train or valid.

I would also prefer to use a df because I don’t want to move the files into separate directories.

I imagine I can inherit from_df and figure out a way to do this but wanted to check if anyone has done anything like this already?

Create two image lists (from df) and then databunch and set the valid dataloader to your valid imagelist :slight_smile:

researching now but looks like this may already exist:

Good find :slight_smile: I had forgotten about that