Single Prediction on new data from Tabular Data Learner

dom.raute · November 15, 2018, 3:14pm

yep, got exactly the same bug.

I think this line
113 cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))

was meant to mean this:
113 cont_names = ifnone(cont_names, list(set(df.columns.values)-set(cat_names)-{dep_var}))

Will do a pull request.

dom.raute · November 15, 2018, 3:18pm

Update: https://github.com/fastai/fastai/pull/1175 - feel free to work it in yourself @sgugger, don’t feel like signing another CLA today

oxyd33 · November 15, 2018, 5:39pm

→ I tried and changed it within my virtual environment in the fastai library but with no success … same error.

sderuiter · November 17, 2018, 7:17pm

I’m on fastai==1.0.26, reading in data (all continuous variables) from df through:
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx)
and training a learner (following the documentation) using
learn = tabular_learner(data, layers=[200,100], metrics=accuracy); learn.fit_one_cycle(1, 1e-2)

, which gives very good metrics.

However, inference does not work.

learn.predict(df1.iloc[1]) leads to:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-2e21ea0a4137> in <module>
      1 df1 = df.drop(columns=['Class'])
      2 cat_names = None
----> 3 learn.predict(df1.iloc[1])

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/basic_train.py in predict(self, img, pbar)
    223         "Return prect class, label and probabilities for `img`."
    224         ds = self.data.single_dl.dataset
--> 225         ds.set_item(img)
    226         res = self.pred_batch(ds_type=DatasetType.Single, pbar=pbar)
    227         ds.clear_item()

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/data_block.py in set_item(self, item)
    359 
    360     def __len__(self)->int: return len(self.x) if self.item is None else 1
--> 361     def set_item(self,item): self.item = self.x.process_one(item)
    362     def clear_item(self): self.item = None
    363     def __repr__(self)->str: return f'{self.__class__.__name__}\ny: {self.y}\nx: {self.x}'

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/data_block.py in process_one(self, item, processor)
     52         if processor is not None: self.processor = processor
     53         self.processor = listify(self.processor)
---> 54         for p in self.processor: item = p.process_one(item)
     55         return item
     56 

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/tabular/data.py in process_one(self, item)
     68         for proc in self.procs: proc(df, test=True)
     69         if self.cat_names is not None:
---> 70             codes = np.stack([c.cat.codes.values for n,c in df[self.cat_names].items()], 1).astype(np.int64) + 1
     71         else: codes = [[]]
     72         if self.cont_names is not None:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/numpy/core/shape_base.py in stack(arrays, axis, out)
    347     arrays = [asanyarray(arr) for arr in arrays]
    348     if not arrays:
--> 349         raise ValueError('need at least one array to stack')
    350 
    351     shapes = set(arr.shape for arr in arrays)

ValueError: need at least one array to stack

The error occurs here:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/tabular/data.py in process_one(self, item)
 68         for proc in self.procs: proc(df, test=True)
 69         if self.cat_names is not None:
---> 70             codes = np.stack([c.cat.codes.values for n,c in df[self.cat_names].items()], 1).astype(np.int64) + 1

which is weird, as self.cat_names should be None (only continuous variables in my dataset).

Any hints?

Thanks!

sgugger · November 18, 2018, 2:46am

I haven’t tried predictions where there is no conts or cats. Will check tomorrow, there might be a bug!

AjitB · November 18, 2018, 12:04pm

Work-around:
My dataset didn’t have any continuous values and I was getting the same error, I had to create a dummy column with all 0 and pass it as “Continuous”. This hack made it the learn.predict(df.iloc[0]) work.

sderuiter · November 18, 2018, 3:02pm

My problem appears to be the other way round? All my columns are continuous, however the class somehow executes code that should only be run when there are categorical columns (which I don’t have). Would your solution then also work to add a dummy column and use that as categorical column?

AjitB · November 18, 2018, 3:05pm

I haven’t tried it yet but, I feel it would work. It is a bug and I believe it will be fixed soon.

sgugger · November 18, 2018, 3:54pm

It should be fixed on master now.

sderuiter · November 19, 2018, 11:44am

Thank you. Can confirm learn.predict(df1.iloc[1]) now works.
I’ve tried to look through your recent commits on master (for learning purposes), but couldn’t find one that would have fixed this. Would you be able to link to the commit that fixed this? Thanks again!

sgugger · November 19, 2018, 2:23pm

It’s mainly on this commit though I made a copy-paste mistake correcte din the next one.

The problem was that I was testing if self.cats or self.conts are None where they have been changed to an empty list if they were None.

AjitB · November 20, 2018, 8:24am

Can’t we use learn.get_preds(ds_type = DatasetType.Test)?

sgugger · November 20, 2018, 2:07pm

That works to. Learner.predict is when you have only one thing to predict.

oxyd33 · November 21, 2018, 2:34pm

Hi guys,

I’d have one issue regarding the result from ‘learn.predict’:

When I run the following Code a couple of times …

… I get one of the 2 following results …

… where the first result (1) is correct (0, 1 or 2 would be possible).

But the thing is, if I run this code 5 times, 4 times I get the wrong result
and 1 time the correct one.

Anyone some idea what’s going on here!??
Also, why does the prediction change when everything (code, input data, weights)
stays the same??

MODEL Training:
[learn.fit_one_cycle(1, 1e-2) #1 0.236339 0.134506 0.942000 ]

dom.raute · December 12, 2018, 12:16pm

@oxyd33 - did you progress with your error? I’m also still getting a TypeError: ‘DataFrame’ objects are mutable, thus they cannot be hashed, despite being on the newest version (1.0.36)

EDIT:
It seems that I have simply not realized that the function signature of TabularDataBunch.from_df silently changed after 1.0.22 — also the get_tabular_learner method was renamed. Now on the latest version and after adjusting my code, I’m getting this error when trying to fit:

data_bunch = TabularDataBunch.from_df('data/', full_df, dep_var, valid_idx, tfms=[FillMissing, Categorify],
                                      cat_names=cat_vars, cont_names=contin_vars)


learn = tabular_learner(data_bunch, layers=[200,100,50], emb_szs={'provider': 3}, metrics=accuracy)

learn.fit(15, 2e-3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-a5c1880537cc> in <module>
----> 1 learn.fit(15, 2e-3)

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    164         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    165         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 166             callbacks=self.callbacks+callbacks)
    167 
    168     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     80             cb_handler.on_epoch_begin()
     81 
---> 82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
     84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)

/anaconda3/lib/python3.6/site-packages/fastprogress/fastprogress.py in __iter__(self)
     63         self.update(0)
     64         try:
---> 65             for i,o in enumerate(self._gen):
     66                 yield o
     67                 if self.auto_update: self.update(i+1)

/anaconda3/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self)
     70         for b in self.dl:
     71             y = b[1][0] if is_listy(b[1]) else b[1]
---> 72             yield self.proc_batch(b)
     73 
     74     @classmethod

/anaconda3/lib/python3.6/site-packages/fastai/basic_data.py in proc_batch(self, b)
     63         "Proces batch `b` of `TensorImage`."
     64         b = to_device(b, self.device)
---> 65         for f in listify(self.tfms): b = f(b)
     66         return b
     67 

TypeError: __init__() missing 1 required positional argument: 'cont_names'

While I really appreciate the dedication to make a great product and API interface, changes with magnitudes like these, especially within minor releases after a v1 actually surprise me. It’s hard to put much trust into the platform.

EDIT2: Changing tfms to procs finally did the trick. Looks like it’s finally working.

shaun1 · December 12, 2018, 11:51pm

What do you actually pass here? For example, if I’m working on the Rossmann notebook, what would I pass here after training the model? Is it test_df? Or do I need to create the learner with the test_df arg set?

oxyd33 · January 3, 2019, 3:25pm

Hi @dom.raute,

sorry for my late reply - because of my job I did not have much time for my private
projects for some time now and therefore wasn’t here in the forum as well.

To answer your question - I just today updated fastai to version 1.0.36, modified my
code accordingly and now it works just perfect!

So if I run my code now it makes (almost) always the right predictions - in a way that
it makes sense with the training accuracy.

hammao · August 27, 2019, 4:28pm

Any luck with tabular data examples… i wish you can give examples from start to finish

hammao · August 27, 2019, 4:30pm

Did you succeed in your quest ?

i have tabular data i’ll love to predict but sadly the class didn’t go as far as i hoped !!!

muellerzr · August 27, 2019, 4:30pm

@hammao see the updated lesson four notebook There is an example for inference:

github.com

fastai/course-v3/blob/master/nbs/dl1/lesson4-tabular.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tabular models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.tabular import *"
   ]
  },
  {
   "cell_type": "markdown",

This file has been truncated. show original

As well as at the end of the Rossmann notebook:

github.com

fastai/course-v3/blob/master/nbs/dl1/lesson6-rossmann.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.tabular import *"
   ]

This file has been truncated. show original