TabularDataBunch.from_df doesn't add / acknowledge test set

jchaykow · January 27, 2019, 12:54am

I’m receiving an error here. Is it deprecated? What’s the new way?

data = (TabularList.from_df(df, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
                   .split_by_idx(valid_idx)
                   .add_test(TabularList.from_df(df_test, cat_names=cat_vars, cont_names=cont_vars, procs=procs))
                   .label_from_df(cols=dep_var, label_cls=CategoryList)
                   .databunch())


AttributeError: 'TabularList' object has no attribute 'add_test'

mignal · January 29, 2019, 7:51am

Hello,

Has anyone been able to get predictions on the test set to work on the latest version? (1.0.42)

With:

data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=test_index, test_df=test_df, 
                            procs=procs, cat_names=cat_names, 
                            cont_names=cont_names)

If learn.get_preds(data.test_ds) is tried, I get predictions in the shape of the training set.

If learn.get_preds(), it’s in the shape of the validation set.

If it’s learn.predict(data.test_ds), I get a key error corresponding to one of the columns.

Any help would be much appreciated!

sgugger · January 29, 2019, 2:23pm

It’s learn.get_preds(ds_type=DatasetType.Test) (as can be seen in the docs).

nosound · February 20, 2019, 10:28am

add_test should go after label_from_df. It will resolve your error
AttributeError: 'TabularList' object has no attribute 'add_test'

Kenneth-Aristide · July 19, 2019, 6:19pm

Hi
I’m working on a Time Series forecasting problem using fastai tabular dataset as instructed in the Rossmann challenge.
When using the same data and model’s parameters during two different training, I’m having a completely different result for a given point.

Thinking is maybe due to the randomized batch selection during training, my idea is to first fix the batch selection for different model training.

The following code lines will give different results when ran twice, how can I fix this?
Is there a parameter to fix this? I saw fix_dl=None but don’t know how to properly use it.

data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, log=True)
.add_test(TabularList.from_df(test, path=path, cat_names=cat_vars, cont_names=cont_vars))
.databunch()) ;
data.show_batch(10)

Thank you

muellerzr · July 19, 2019, 6:22pm

When generating the training loader, it’s shuffled and the last batch (if not complete) is dropped. If you try showing a batch from the validation set it should always be the same.

Kenneth-Aristide · July 23, 2019, 3:17pm

Thank you for your reply !

XuX27 · September 1, 2019, 1:36am

so sgugger, the function of PROCS[FillMissing] coulnd not fill the NANs in data？

badri · June 2, 2020, 6:08pm

if the optional parameter, test_df=df_test is given in TextLMDatabunch and TextClasDatabunch, how can we load a pretrained model and ask the lrnsavedmodel.getpreds() to use this df_test that has been given already.
I get a NoneType error when I use get_preds(ds_type=Dataset.Type.test)

However, when I add the df_test via load_learner or add_test, it works.
what is the point in having an option of specifying df_test in databunch if it cannot be used in predict unless added again via load_learner or add_test

bobseboy · August 10, 2020, 7:33am

It’s ds_type=DatasetType.test (Without the first dot)