You should be able to. I’ve been doing this for tabular problems relating to time-series. You make three imagelist’s, one having 70, one having 20, and one having 10%. Make the largest training, second largest as a validation, and smallest as 10%. I’ll show you my code for doing it with tabular, let me know if you need help merging it to images. I ran this on each CSV document I brought in from pandas:
class CombineData2:
def __init__(self, df1, df2):
self.train = df1.train.append([df2.train])
self.valid = df1.valid.append([df2.valid])
self.test = df1.test.append([df2.test])
This takes in two pandas dataframes and merges their training valid and test sets. To get those split I used the following function:
class PrepData:
def __init__(self, dataframe, activity):
self.dataframe = dataframe
dataframe['Activity'] = activity
self.lenTrain = int(len(dataframe)/100*70)
self.lenValid = self.lenTrain + int(len(dataframe)/100*20)
self.lenTest = self.lenValid + int(len(dataframe)/100*10)
self.train = dataframe.iloc[:self.lenTrain]
self.valid = dataframe.iloc[self.lenTrain:self.lenValid]
self.test = dataframe.iloc[self.lenValid:]
I passed in a dataframe and a string for activity as my data was split by files instead of having the class listed but the idea should still be the same. Then generation of the databunch was as follows:
training = TabularList.from_df(initialClassificationData.train, path = '', cat_names = cat_vars, cont_names = var, procs=procs).split_none().label_from_df(cols=dep_var, label_cls = CategoryList)
valid = TabularList.from_df(initialClassificationData.valid, path='', cat_names = cat_vars, cont_names = var, procs=procs).split_none().label_from_df(cols=dep_var, label_cls = CategoryList)
test = TabularList.from_df(initialClassificationData.test, path='', cat_names = cat_vars, cont_names = var, procs=procs).split_none().label_from_df(cols=dep_var, label_cls = CategoryList)
training.valid = valid.train
training.test = test.train
initialClassificationDatabunch = training.databunch()
You should be able to repeat this for images. If you have issues tell me and I can try to work something out after my classes today. The key here is split_none() so we just get the first 70, next 20, and next 10% of data stored within it