Is there a way in v3 to form a TabularList using multiple dfs, specifically a training df and a validation df? Of course, I can concatenate them and supply the correct idxs to split_by_idx, but I was wondering if TabularList can handle this on its own. I tried submitting the two dfs as a list to split_by_list, but that didn’t work.
I don’t think so. I think based on the wording of the original docs at https://docs.fast.ai/tabular.data.html#TabularDataBunch this may have been originally implemented but was subsequently deprecated:
The best way to quickly get your data in a
DataBunchsuitable for tabular data is to organize it in two (or three) dataframes. One for training, one for validation, and if you have it, one for testing. Here we are interested in a subsample of the adult dataset.
But it seems like it went back to the pass 1 df and then allow it to automatically split it based on your valid_idx. If you wanted to manually pass a separate valid and train dataframe without concatenating then you would have to redefine the TabularDataBunch class and the TabularList + split_by_idx function which is called by TabularDataBunch
src = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs) .split_by_idx(valid_idx))
Yeah, that’s what I thought, but wasn’t sure. Thanks very much!