Better way to do a custom validation dataframe?

muellerzr · June 1, 2019, 12:25am

I want to oversample to help deal with some class imbalance, so I want separate train and validation dataframes. I don’t believe Tabular has a databunch.from_dfs?

Is there a more efficient way than:

train = (TabularList.from_df(train, path='', cat_var, cont_var, procs)
             .split_none()
             .label_from_df(dep_var)
             .databunch())

valid = (TabularList.from_df(valid, path='', cat_var, cont_var, procs, processor=data.processor)
             .split_none()
             .label_from_df(dep_var)
             .databunch())

train.valid_dl = valid.train_dl

Thanks!

muellerzr · June 1, 2019, 12:45am

I know I can append the valid df to the train df and do a split_by_idx so long as I keep track of when one starts and the other begins.