Single Prediction on new data from Tabular Data Learner

yep, got exactly the same bug.

I think this line
113 cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))

was meant to mean this:
113 cont_names = ifnone(cont_names, list(set(df.columns.values)-set(cat_names)-{dep_var}))

Will do a pull request.

Update: https://github.com/fastai/fastai/pull/1175 - feel free to work it in yourself @sgugger, don’t feel like signing another CLA today :wink:

–> I tried and changed it within my virtual environment in the fastai library but with no success … same error.

I’m on fastai==1.0.26, reading in data (all continuous variables) from df through:
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx)
and training a learner (following the documentation) using
learn = tabular_learner(data, layers=[200,100], metrics=accuracy); learn.fit_one_cycle(1, 1e-2)

, which gives very good metrics.

However, inference does not work.

learn.predict(df1.iloc[1]) leads to:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-2e21ea0a4137> in <module>
      1 df1 = df.drop(columns=['Class'])
      2 cat_names = None
----> 3 learn.predict(df1.iloc[1])

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/basic_train.py in predict(self, img, pbar)
    223         "Return prect class, label and probabilities for `img`."
    224         ds = self.data.single_dl.dataset
--> 225         ds.set_item(img)
    226         res = self.pred_batch(ds_type=DatasetType.Single, pbar=pbar)
    227         ds.clear_item()

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/data_block.py in set_item(self, item)
    359 
    360     def __len__(self)->int: return len(self.x) if self.item is None else 1
--> 361     def set_item(self,item): self.item = self.x.process_one(item)
    362     def clear_item(self): self.item = None
    363     def __repr__(self)->str: return f'{self.__class__.__name__}\ny: {self.y}\nx: {self.x}'

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/data_block.py in process_one(self, item, processor)
     52         if processor is not None: self.processor = processor
     53         self.processor = listify(self.processor)
---> 54         for p in self.processor: item = p.process_one(item)
     55         return item
     56 

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/tabular/data.py in process_one(self, item)
     68         for proc in self.procs: proc(df, test=True)
     69         if self.cat_names is not None:
---> 70             codes = np.stack([c.cat.codes.values for n,c in df[self.cat_names].items()], 1).astype(np.int64) + 1
     71         else: codes = [[]]
     72         if self.cont_names is not None:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/numpy/core/shape_base.py in stack(arrays, axis, out)
    347     arrays = [asanyarray(arr) for arr in arrays]
    348     if not arrays:
--> 349         raise ValueError('need at least one array to stack')
    350 
    351     shapes = set(arr.shape for arr in arrays)

ValueError: need at least one array to stack

The error occurs here:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/tabular/data.py in process_one(self, item)
 68         for proc in self.procs: proc(df, test=True)
 69         if self.cat_names is not None:
---> 70             codes = np.stack([c.cat.codes.values for n,c in df[self.cat_names].items()], 1).astype(np.int64) + 1

which is weird, as self.cat_names should be None (only continuous variables in my dataset).

Any hints?

Thanks!

I haven’t tried predictions where there is no conts or cats. Will check tomorrow, there might be a bug!

Work-around:
My dataset didn’t have any continuous values and I was getting the same error, I had to create a dummy column with all 0 and pass it as “Continuous”. This hack made it the learn.predict(df.iloc[0]) work.

My problem appears to be the other way round? All my columns are continuous, however the class somehow executes code that should only be run when there are categorical columns (which I don’t have). Would your solution then also work to add a dummy column and use that as categorical column?

1 Like

I haven’t tried it yet but, I feel it would work. It is a bug and I believe it will be fixed soon.

It should be fixed on master now.

Thank you. Can confirm learn.predict(df1.iloc[1]) now works.
I’ve tried to look through your recent commits on master (for learning purposes), but couldn’t find one that would have fixed this. Would you be able to link to the commit that fixed this? Thanks again!

1 Like

It’s mainly on this commit though I made a copy-paste mistake correcte din the next one.

The problem was that I was testing if self.cats or self.conts are None where they have been changed to an empty list if they were None.

Can’t we use learn.get_preds(ds_type = DatasetType.Test)?

1 Like

That works to. Learner.predict is when you have only one thing to predict.

Hi guys,

I’d have one issue regarding the result from ‘learn.predict’:

When I run the following Code a couple of times …

… I get one of the 2 following results …

… where the first result (1) is correct (0, 1 or 2 would be possible).

But the thing is, if I run this code 5 times, 4 times I get the wrong result
and 1 time the correct one.

Anyone some idea what’s going on here!??
Also, why does the prediction change when everything (code, input data, weights)
stays the same??

MODEL Training:
[learn.fit_one_cycle(1, 1e-2) #1 0.236339 0.134506 0.942000 ]

@oxyd33 - did you progress with your error? I’m also still getting a TypeError: ‘DataFrame’ objects are mutable, thus they cannot be hashed, despite being on the newest version (1.0.36)

EDIT:
It seems that I have simply not realized that the function signature of TabularDataBunch.from_df silently changed after 1.0.22 — also the get_tabular_learner method was renamed. Now on the latest version and after adjusting my code, I’m getting this error when trying to fit:

data_bunch = TabularDataBunch.from_df('data/', full_df, dep_var, valid_idx, tfms=[FillMissing, Categorify],
                                      cat_names=cat_vars, cont_names=contin_vars)


learn = tabular_learner(data_bunch, layers=[200,100,50], emb_szs={'provider': 3}, metrics=accuracy)

learn.fit(15, 2e-3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-a5c1880537cc> in <module>
----> 1 learn.fit(15, 2e-3)

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    164         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    165         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 166             callbacks=self.callbacks+callbacks)
    167 
    168     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     80             cb_handler.on_epoch_begin()
     81 
---> 82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
     84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)

/anaconda3/lib/python3.6/site-packages/fastprogress/fastprogress.py in __iter__(self)
     63         self.update(0)
     64         try:
---> 65             for i,o in enumerate(self._gen):
     66                 yield o
     67                 if self.auto_update: self.update(i+1)

/anaconda3/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self)
     70         for b in self.dl:
     71             y = b[1][0] if is_listy(b[1]) else b[1]
---> 72             yield self.proc_batch(b)
     73 
     74     @classmethod

/anaconda3/lib/python3.6/site-packages/fastai/basic_data.py in proc_batch(self, b)
     63         "Proces batch `b` of `TensorImage`."
     64         b = to_device(b, self.device)
---> 65         for f in listify(self.tfms): b = f(b)
     66         return b
     67 

TypeError: __init__() missing 1 required positional argument: 'cont_names'

While I really appreciate the dedication to make a great product and API interface, changes with magnitudes like these, especially within minor releases after a v1 actually surprise me. It’s hard to put much trust into the platform.

EDIT2: Changing tfms to procs finally did the trick. Looks like it’s finally working.

What do you actually pass here? For example, if I’m working on the Rossmann notebook, what would I pass here after training the model? Is it test_df? Or do I need to create the learner with the test_df arg set?

1 Like

Hi @dom.raute,

sorry for my late reply - because of my job I did not have much time for my private
projects for some time now and therefore wasn’t here in the forum as well.

To answer your question - I just today updated fastai to version 1.0.36, modified my
code accordingly and now it works just perfect! :grinning:

So if I run my code now it makes (almost) always the right predictions - in a way that
it makes sense with the training accuracy.

Any luck with tabular data examples… i wish you can give examples from start to finish

Did you succeed in your quest ?

i have tabular data i’ll love to predict but sadly the class didn’t go as far as i hoped !!!

@hammao see the updated lesson four notebook There is an example for inference:

As well as at the end of the Rossmann notebook: