Tabular data. How to predict on full test set

kuil · November 15, 2018, 10:57pm

I have test data
test1 = TabularList.from_df(test, cat_names=cat_names, cont_names=cont_names)
and databunch with train and test data
data1 = (TabularList.from_df(train.reset_index().drop('index',axis=1).iloc[0:1000], cat_names=cat_names, cont_names=cont_names, procs=procs)
.random_split_by_pct(0.33)
.label_from_df(cols = 'age')
.add_test(test1, label='age')
.databunch())

I made a learner like in lesson 4 and couldn’t understand how to infere all test data to neural net.
learn.get_preds() get prediction for validation data.
learn.pred_batch() get prediction for validation data for one batch.
learn.predict(test.iloc[0]) get prediction only for 1 row. And throw an error when I try to put there a slice.

learn.get_preds(test1) surpisingly get prediction for validation data. again.

Of course I can do prediction row by row in cycle (and this is very slow!), but there should be faster and better way?

kuil · November 15, 2018, 11:15pm

It seems that correct code is
learn.get_preds(DatasetType.Test)
but now I have error
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'NoneType'>

data1.show_batch(rows = 5, ds_type=DatasetType.Test) give reasonable result, why learn.get_preds(DatasetType.Test) not working? What can I change?

kuil · November 16, 2018, 2:08pm

Replying myself.
get_preds not worked on my data because I have multiclassification problem, not binary.
It seems to me, that now fast.ai not acceptable for multiclass data problems

ranpelta · October 30, 2019, 2:30pm

Hi,

Any updates on the subject?

I use a loop for each row of my new test dataset according to the example here. It takes a long time, and probably not really efficient.

Is there a better way to perform predictions on an entire dataframe? I mean a new dataframe that was not part of the original train\validation\test sets?

muellerzr · October 30, 2019, 2:38pm

See my example here:

github.com

muellerzr/fastai-Experiments-and-tips/blob/master/Test Set Generation/Labeled_Test_Set.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "lesson4-tabular.ipynb",
      "version": "0.3.2",
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PcZh_7tRk7ke",

This file has been truncated. show original

Even if not labeled, this is how to bring in outside datasets into fastai v1. It’s much easier in v2

ranpelta · October 30, 2019, 4:00pm

Hi,
correct me if I’m wrong, but in your function CalculateAccuracy you also loop every row in your test df and get prediction.

muellerzr · October 30, 2019, 4:00pm

That was just to show a comparison

ranpelta · October 30, 2019, 4:15pm

So, to loop each row in dataframe is the faster way?

muellerzr · October 30, 2019, 4:17pm

No, to create a separate DataLoader and then overload learn.valid_dl is the faster way

ranpelta · October 30, 2019, 4:23pm

oh… I saw 1.24 vs 3… I thought it was 1.24 minutes vs 3 minutes… but it was 1.24 minutes vs 3 seconds…
I’ll try it later.
Thanks