Yes. I just noticed that and adjusted
@sgugger would it be possible to maintain types when generating our databunch? This came up in a discussion on my kaggle kernel here but essentially weāre noticing memory errors (using too much) because some types we want to keep as int8ās and int16ās instead of int64 (for cat) and float64(for cont). I noticed this in the source code too. Is there any plans to adjust this? As this is a big memory user.
A comparison was made with memory usage:
Before: 330128 [pandas dataframe]
After: 1550000 [TabularPandas]
Let me know your thoughts
Edit: Checked the _40 nb and seems you got rid of that hard-code dtype if Iām not mistaken? (and so it maintains type)
āNameError: name āTabularPandasā is not definedā
How to deal with this?
Does this imply that we can insert ātest datasetā to get predictions, like in a competition?
If yes, I shall be obliged if you do this on any predicting- competition.
It would be immensely helpful for me.
If itās labeled you can use the labels. Else it operates like the normal test set did back in 1.0 (where there were no labels like Kaggle competitions)
For the import, what libraries are you importing before your call to TabularPandas?
I went through your Kaggle kernal i.e. https://www.kaggle.com/muellerzr/fastai-v2-starter-code , & cloned the github repo, imported the modules.
I think TabularPandas now gets recognized.
[https://colab.research.google.com/drive/1ZVz9fg6g0lTzeqG-lSJDBdTwaOy3DiWU](http://This is where Iām stuck)
Iām going through a Kaggle competition, but Iām stuck here.
In the beginning it looks like youāre still using fastai 1.0 not 2.0. (Youāre using TabularList).
How to fix it?
I shall be obliged if you edit it.
Look at my notebook on Kaggle or the adults notebook (notebook 40-41) on the fastai dev repo to see how the new API is done.
Roger that.
Your kernal is uptill training.
It shall be immensely helpful if you give a little example of predicting on a test-set by furthuring your kernal.
See the post I made earlier today on test sets A Brief Guide to Test Sets in v2 (you can do labelled now too!)
Roger that
Here, ālearnā implies the model which has been trained & test-set wonāt contain target-label column. Right?
One more Q.: Shall I have to indicate cat_names & cont_names again, even though I have indicated them for training i.e. ātoā data-bunch?
It says āAttributeError: āLearnerā object has no attribute āfit_one_cycleāā
AttributeError: āLearnerā object has no attribute āfit_one_cycleā
@sgugger how do we use IndexSplitter
? Iām trying to walk through Rossmann at the moment. Iām attempting:
splits = IndexSplitter(valid_idx)
But in creating the TabularPandas
it will throw an error saying function
object is not iterable. So then I tried IndexSplitter(valid_idx)(valid_idx)
but that also did not work. Advice?
A splitter always takes the items (or something of the same length), so you have to pass
splits = IndexSplitter(valid_idx)(items)
If your items are in a datafame, you can also just pass a range of the same size
splits = IndexSplitter(valid_idx)(range_of(df))
Thanks! @sgugger That makes sense. I get a value error now:
ValueError: operands could not be broadcast together with shapes (0,) (802943,)
Or I guess the better question is: should I be getting my valid_idx
a different way than
cut = train_df['Date'][(train_df['Date'] == train_df['Date'][len(test_df)])].index.max()
valid_idx = range(cut)
I canāt help without seeing the full stack trace.
Sure, sorry!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-1ed41788e5c8> in <module>()
1 to = TabularPandas(train_df, procs=procs, cat_names=cat_vars, cont_names=cont_vars,
----> 2 y_names=dep_var, is_y_cat=False, splits=splits)
/usr/local/lib/python3.6/dist-packages/fastai2/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, is_y_cat, splits, do_setup)
31 def __init__(self, df, procs=None, cat_names=None, cont_names=None, y_names=None, is_y_cat=True, splits=None, do_setup=True):
32 if splits is None: splits=[range_of(df)]
---> 33 df = df.iloc[sum(splits, [])].copy()
34 super().__init__(df)
35
ValueError: operands could not be broadcast together with shapes (0,) (802943,)
The notebook is here