Typically when I have created a databunch using from_df() I use the valid_pct to specify what percentage to split into train vs valid. Unfortunately I can’t do this because I don’t want to split it randomly. I have groups of similar images and I don’t want to split a group between train and valid. I want to keep a group completely in train or valid.
I would also prefer to use a df because I don’t want to move the files into separate directories.
I imagine I can inherit from_df and figure out a way to do this but wanted to check if anyone has done anything like this already?