TabularDataBunch.from_df doesn't add / acknowledge test set

I’m receiving an error here. Is it deprecated? What’s the new way?

data = (TabularList.from_df(df, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
                   .split_by_idx(valid_idx)
                   .add_test(TabularList.from_df(df_test, cat_names=cat_vars, cont_names=cont_vars, procs=procs))
                   .label_from_df(cols=dep_var, label_cls=CategoryList)
                   .databunch())


AttributeError: 'TabularList' object has no attribute 'add_test'
1 Like

Hello,

Has anyone been able to get predictions on the test set to work on the latest version? (1.0.42)

With:

data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=test_index, test_df=test_df, 
                            procs=procs, cat_names=cat_names, 
                            cont_names=cont_names)

If learn.get_preds(data.test_ds) is tried, I get predictions in the shape of the training set.

If learn.get_preds(), it’s in the shape of the validation set.

If it’s learn.predict(data.test_ds), I get a key error corresponding to one of the columns.

Any help would be much appreciated!

It’s learn.get_preds(ds_type=DatasetType.Test) (as can be seen in the docs).

2 Likes

add_test should go after label_from_df. It will resolve your error
AttributeError: 'TabularList' object has no attribute 'add_test'

Hi :slight_smile:
I’m working on a Time Series forecasting problem using fastai tabular dataset as instructed in the Rossmann challenge.
When using the same data and model’s parameters during two different training, I’m having a completely different result for a given point.

Thinking is maybe due to the randomized batch selection during training, my idea is to first fix the batch selection for different model training.

The following code lines will give different results when ran twice, how can I fix this?
Is there a parameter to fix this? I saw fix_dl=None but don’t know how to properly use it.

data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, log=True)
.add_test(TabularList.from_df(test, path=path, cat_names=cat_vars, cont_names=cont_vars))
.databunch()) ;
data.show_batch(10)

Thank you

When generating the training loader, it’s shuffled and the last batch (if not complete) is dropped. If you try showing a batch from the validation set it should always be the same.

1 Like

Thank you for your reply !

so sgugger, the function of PROCS[FillMissing] coulnd not fill the NANs in data?

if the optional parameter, test_df=df_test is given in TextLMDatabunch and TextClasDatabunch, how can we load a pretrained model and ask the lrnsavedmodel.getpreds() to use this df_test that has been given already.
I get a NoneType error when I use get_preds(ds_type=Dataset.Type.test)

However, when I add the df_test via load_learner or add_test, it works.
what is the point in having an option of specifying df_test in databunch if it cannot be used in predict unless added again via load_learner or add_test

It’s ds_type=DatasetType.test (Without the first dot)