Fastai v2 tabular

Yes. I just noticed that and adjusted

1 Like

@sgugger would it be possible to maintain types when generating our databunch? This came up in a discussion on my kaggle kernel here but essentially weā€™re noticing memory errors (using too much) because some types we want to keep as int8ā€™s and int16ā€™s instead of int64 (for cat) and float64(for cont). I noticed this in the source code too. Is there any plans to adjust this? As this is a big memory user.

A comparison was made with memory usage:

Before: 330128 [pandas dataframe]
After: 1550000 [TabularPandas]

Let me know your thoughts :slight_smile:

Edit: Checked the _40 nb and seems you got rid of that hard-code dtype if Iā€™m not mistaken? (and so it maintains type)

ā€œNameError: name ā€˜TabularPandasā€™ is not definedā€

How to deal with this?

Does this imply that we can insert ā€œtest datasetā€ to get predictions, like in a competition?
If yes, I shall be obliged if you do this on any predicting- competition.
It would be immensely helpful for me.

If itā€™s labeled you can use the labels. Else it operates like the normal test set did back in 1.0 (where there were no labels like Kaggle competitions)

For the import, what libraries are you importing before your call to TabularPandas?

I went through your Kaggle kernal i.e. https://www.kaggle.com/muellerzr/fastai-v2-starter-code , & cloned the github repo, imported the modules.
I think TabularPandas now gets recognized.

[https://colab.research.google.com/drive/1ZVz9fg6g0lTzeqG-lSJDBdTwaOy3DiWU](http://This is where Iā€™m stuck)

Iā€™m going through a Kaggle competition, but Iā€™m stuck here.

In the beginning it looks like youā€™re still using fastai 1.0 not 2.0. (Youā€™re using TabularList).

How to fix it?
I shall be obliged if you edit it.

Look at my notebook on Kaggle or the adults notebook (notebook 40-41) on the fastai dev repo to see how the new API is done.

Roger that.
Your kernal is uptill training.
It shall be immensely helpful if you give a little example of predicting on a test-set by furthuring your kernal.

See the post I made earlier today on test sets A Brief Guide to Test Sets in v2 (you can do labelled now too!)

1 Like

Roger that

Here, ā€œlearnā€ implies the model which has been trained & test-set wonā€™t contain target-label column. Right?
One more Q.: Shall I have to indicate cat_names & cont_names again, even though I have indicated them for training i.e. ā€œtoā€ data-bunch?

It says ā€œAttributeError: ā€˜Learnerā€™ object has no attribute ā€˜fit_one_cycleā€™ā€

AttributeError: ā€˜Learnerā€™ object has no attribute ā€˜fit_one_cycleā€™

@sgugger how do we use IndexSplitter? Iā€™m trying to walk through Rossmann at the moment. Iā€™m attempting:

splits = IndexSplitter(valid_idx)

But in creating the TabularPandas it will throw an error saying function object is not iterable. So then I tried IndexSplitter(valid_idx)(valid_idx) but that also did not work. Advice?

A splitter always takes the items (or something of the same length), so you have to pass

splits = IndexSplitter(valid_idx)(items)

If your items are in a datafame, you can also just pass a range of the same size

splits = IndexSplitter(valid_idx)(range_of(df))

Thanks! @sgugger That makes sense. I get a value error now:

ValueError: operands could not be broadcast together with shapes (0,) (802943,) 

Or I guess the better question is: should I be getting my valid_idx a different way than

cut = train_df['Date'][(train_df['Date'] == train_df['Date'][len(test_df)])].index.max()
valid_idx = range(cut)

I canā€™t help without seeing the full stack trace.

Sure, sorry!

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-1ed41788e5c8> in <module>()
      1 to = TabularPandas(train_df, procs=procs, cat_names=cat_vars, cont_names=cont_vars,
----> 2                    y_names=dep_var, is_y_cat=False, splits=splits)

/usr/local/lib/python3.6/dist-packages/fastai2/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, is_y_cat, splits, do_setup)
     31     def __init__(self, df, procs=None, cat_names=None, cont_names=None, y_names=None, is_y_cat=True, splits=None, do_setup=True):
     32         if splits is None: splits=[range_of(df)]
---> 33         df = df.iloc[sum(splits, [])].copy()
     34         super().__init__(df)
     35 

ValueError: operands could not be broadcast together with shapes (0,) (802943,) 

The notebook is here