I think I’ve found the issue and it involves the fact that CategoryMap uses sort=True. Let me verify real quick (this is just me scanning source code)
Okay I’ve solved the issue @vrodriguezf , let me go put a PR in but here’s what I had to do:
Categorize
now became:
class myCategorize(Transform):
"Reversible transform of category string to `vocab` id"
loss_func,order=CrossEntropyLossFlat(),1
def __init__(self, vocab=None, add_na=False, sort=True):
self.add_na = add_na
self.vocab = None if vocab is None else CategoryMap(vocab, sort=sort, add_na=add_na)
def setups(self, dsets):
if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, sort=sort, add_na=self.add_na)
self.c = len(self.vocab)
def encodes(self, o): return TensorCategory(self.vocab.o2i[o])
def decodes(self, o): return Category (self.vocab [o])
Specifically we pass in a sort
value which will re-sort those values instead of maintaining the order that was passed in. Then we adjusted the CategoryBlock
:
def myCategoryBlock(vocab=None, sort=True, add_na=False):
return TransformBlock(type_tfms=myCategorize(vocab=vocab, sort=sort, add_na=add_na))
To ensure this request. Finally we had to adjust the Categorize
setups like so:
@myCategorize
def setups(self, to:Tabular):
if len(to.y_names) > 0:
if self.vocab is None:
self.vocab = CategoryMap(getattr(to, 'train', to).iloc[:,to.y_names[0]].items)
else:
self.vocab = CategoryMap(self.vocab, sort=False, add_na=self.add_na)
self.c = len(self.vocab)
return self(to)
By default it was sorting every time, this is the behavior that needed to be changed.
Then when you call y_block
make sure to use myCategoryBlock
with your parameters
Wow that was fast! I’ll use that workaround for now thank you so much, you’re awesome!
If you run into an issue, categorize should be:
# export
class Categorize(Transform):
"Reversible transform of category string to `vocab` id"
loss_func,order=CrossEntropyLossFlat(),1
def __init__(self, vocab=None, sort=True, add_na=False):
self.add_na = add_na
self.sort = sort
self.vocab = None if vocab is None else CategoryMap(vocab, sort=sort, add_na=add_na)
def setups(self, dsets):
if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, sort=self.sort, add_na=self.add_na)
self.c = len(self.vocab)
def encodes(self, o): return TensorCategory(self.vocab.o2i[o])
def decodes(self, o): return Category (self.vocab [o])
(finding bugs as I actually code this thing )
Edit: All bugs are fixed, you should be good to go @vrodriguezf
FYI this fix is now in the main library. Install fastcore
and fastai2
with the dev installs to use right away
Hi, there is a weird behaviour (in my opinion) when calling learn.show_results()
with a TabularLearner.
If you type type(learn.dl)
after fitting the learner, I get fastai2.tabular.core.TabDataLoader
. However, after calling learn.show_results()
it gives list
. Does it make sense that a call to show_results
modify the type an attribute of the learner?
I realized about this because fastshap was rasing an error because it expects learn.dl
to be a TabDataLoader
.
Thanks!
That’s an interesting behavior, because this is all show_results is:
@typedispatch
def show_results(x:Tabular, y:Tabular, samples, outs, ctxs=None, max_n=10, **kwargs):
df = x.all_cols[:max_n]
for n in x.y_names: df[n+'_pred'] = y[n][:max_n].values
display_df(df)
Notice we don’t actually modify anything. Nor make it a list. I’m wondering if we need .copy()’s here instead? (Maybe you can try that?) if that doesn’t work, highly recommend filing an issue on GitHub with a reproducer (cc @sgugger if you can think of why on the top of your head)
learn.dl
is not a reliable attribute: it’s saved during any run of training loop/inference to represent the dl currently used, but outside of that it’s not a useful attribute. In this case it goes from the validation dataloader (from your previous fit) to a list containing one batch (from the get_preds launched by show_results).
In link to #350 I’ll set it to None at the end of every training for cleanup (with learn.xb
, learn.yb
, learn.preds
, learn.loss
) so you won’t actually see anything in it.
Noted! @vrodriguezf I’ll make some adjustments to fastshap by the end of the week with a fix. Thanks Sylvain
Understood! It looked like a weird attribute tbh. Thanks for the fix!!!
On your Notebook @muellerzr you make use of
tabular_config({'emb_p':float(dp),
'wd':float(wd)})
With this I am getting next error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/bayes_opt/target_space.py in probe(self, params)
190 try:
--> 191 target = self._cache[_hashable(x)]
192 except KeyError:
KeyError: (0.21434078230426126, 158.04867401632373, 100.10293733561039, 744.1986307373115, 0.014684121522803134, 1.1846771895375956, 0.07482958046651729)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<timed eval> in <module>
~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/bayes_opt/bayesian_optimization.py in maximize(self, init_points, n_iter, acq, kappa, kappa_decay, kappa_decay_delay, xi, **gp_params)
183 iteration += 1
184
--> 185 self.probe(x_probe, lazy=False)
186
187 if self._bounds_transformer:
~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/bayes_opt/bayesian_optimization.py in probe(self, params, lazy)
114 self._queue.add(params)
115 else:
--> 116 self._space.probe(params)
117 self.dispatch(Events.OPTIMIZATION_STEP)
118
~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/bayes_opt/target_space.py in probe(self, params)
192 except KeyError:
193 params = dict(zip(self._keys, x))
--> 194 target = self.target_func(**params)
195 self.register(x, target)
196 return target
<ipython-input-7-c018850183eb> in fit_with(lr, wd, dp, n_layers, layer_1, layer_2, layer_3)
8 layers = [int(layer_1)]
9 config = tabular_config({"emb_p":float(dp),
---> 10 "wd":float(wd)})
11 learn = tabular_learner(dls, layers=layers, metrics=accuracy, config = config)
12
TypeError: tabular_config() takes 0 positional arguments but 1 was given
@WaterKnight there was a change in the code apparently somewhere along the line. Now the kwargs are passed in as actual parameters. IE:
Instead of
kwargs = {'embed_p':0.1}
config = tabular_config(kwargs)
You should do
config = tabular_config(embed_p=0.1)
I’ll show this adjustment in the notebook shortly
Edit: @WaterKnight that notebook has been updated. Thanks for the bug report
@muellerzr Thank you very much as always!
In addition, I would like to know what type of data accept the predict method
It’s a Pandas row. IE:
learn.predict(df.iloc[0])
A NumPy array will not work. Also it only works on one individual row
You are welcome. I think that I have seen this in other notebooks. So take a look at it, if you can’t I will do it for you!
Should be the only notebook that has it. If there’s any more please let me know
You are right!
I am going to look at your ensembling notebook. In a subject we have worked with LightGBM and XGBoost. I am trying to find if this learner can do better!
Most likely it will not outperform the GBM or XGBoost (it may with a ton of hyperparameter tuning, but without it it will still be close), however ensembling always helps.
Yes, I have tried also and stacking with feature engineer and this was the best solution.
However, as fastai2 learner runs very fast. I am going to try to make an ensemble with fastai learner too
@muellerzr executing the following code for predicting in a full dataframe is printing white lines like hell:
with learn.no_bar() and learn.no_logging():
res=[]
for i in range(df_test.shape[0]):
aux=learn.predict(df_test.iloc[i])[2].cpu().numpy()
res.append(aux)
You should use get_preds and test_dl for anything more than 1 item otherwise it’s inefficient