Tabular data. How to predict on full test set

I have test data
test1 = TabularList.from_df(test, cat_names=cat_names, cont_names=cont_names)
and databunch with train and test data
data1 = (TabularList.from_df(train.reset_index().drop('index',axis=1).iloc[0:1000], cat_names=cat_names, cont_names=cont_names, procs=procs)
.random_split_by_pct(0.33)
.label_from_df(cols = 'age')
.add_test(test1, label='age')
.databunch())

I made a learner like in lesson 4 and couldn’t understand how to infere all test data to neural net.
learn.get_preds() get prediction for validation data.
learn.pred_batch() get prediction for validation data for one batch.
learn.predict(test.iloc[0]) get prediction only for 1 row. And throw an error when I try to put there a slice.

learn.get_preds(test1) surpisingly get prediction for validation data. again.

Of course I can do prediction row by row in cycle (and this is very slow!), but there should be faster and better way?

3 Likes

It seems that correct code is
learn.get_preds(DatasetType.Test)
but now I have error
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'NoneType'>

data1.show_batch(rows = 5, ds_type=DatasetType.Test) give reasonable result, why learn.get_preds(DatasetType.Test) not working? What can I change?

2 Likes

Replying myself.
get_preds not worked on my data because I have multiclassification problem, not binary.
It seems to me, that now fast.ai not acceptable for multiclass data problems

Hi,

Any updates on the subject?

I use a loop for each row of my new test dataset according to the example here. It takes a long time, and probably not really efficient.

Is there a better way to perform predictions on an entire dataframe? I mean a new dataframe that was not part of the original train\validation\test sets?

See my example here:

Even if not labeled, this is how to bring in outside datasets into fastai v1. It’s much easier in v2

1 Like

Hi,
correct me if I’m wrong, but in your function CalculateAccuracy you also loop every row in your test df and get prediction.

That was just to show a comparison

So, to loop each row in dataframe is the faster way?

No, to create a separate DataLoader and then overload learn.valid_dl is the faster way

oh… I saw 1.24 vs 3… I thought it was 1.24 minutes vs 3 minutes… but it was 1.24 minutes vs 3 seconds…
I’ll try it later.
Thanks