A walk with fastai2 - Tabular - Study Group and Online Lectures Megathread

sgugger · March 25, 2020, 12:36pm

It’s not deprecated, you just need to pass them in config=tabular_config(). All customization of models are done this way in fastai v2, to avoid mixing the kwargs of the models with those of the Learner.

AjayStark · March 25, 2020, 12:55pm

Hi, I’ve watched Tabular lesson-1 and I’ve a couple of doubts.
Zach shows how to plot using matplotlib, but the .plot() function plots any column with the serial number only. So how to plot between two columns(like a scatter plot between age and working hours)?

Also towards the end, he creates a tabular model called ‘net’. Why can’t we do a lr_find or .fit on net like we do on a tabular_learner?

Thanks,

muellerzr · March 25, 2020, 3:37pm

Short answer is it’s just a model, not a Learner instance. We need to make it a Learner with our net to do so.

AjayStark · March 25, 2020, 3:47pm

Yeahh… Just now did that… Works great!
And how to do the plotting between 2 columns?

muellerzr · March 25, 2020, 3:47pm

I"d recommend reading up here: Visualizing categorical data — seaborn 0.13.2 documentation

AjayStark · March 25, 2020, 3:48pm

Sure, I’ll look into it.
Thank you!
And thank you so much for the videos it really helped me understand way lot better.

hahmed988 · March 25, 2020, 5:11pm

Thanks got it !

AjayStark · March 25, 2020, 5:14pm

Hi @muellerzr I passed the model into a Learner and used ‘CrossEntropyLossFlat’ as the loss function used in tabular_learner is flattened form of nn.CrossEntropyLoss.
But when i do .lr_find i get the following error:

“bool value of Tensor with more than one value is ambiguous”

How do i solve this?

Thanks,

muellerzr · March 25, 2020, 5:16pm

You need to use an instance of it. CrossEntropyLossFlat()

muellerzr · March 25, 2020, 6:49pm

Alright everyone we’ll be live streaming today! Here’s the link: https://youtu.be/XoWX_YOrtPg

We’ll be covering two different methods of model interpretation, ClassConfusion and SHAP, along with some general guidelines and pitfalls into doing research in this field I’ve found

live stream up at 4:45pm CST

hello34 · March 25, 2020, 10:24pm

Has anyone created lime library for fastdotai yet?

muellerzr · March 25, 2020, 10:41pm

Thanks everyone who joined! I’ve added in the notebook about looking at research and the important ideas you should be keeping in mind at the top post, next week we’ll be looking at a few new architectures for tabular and then that will be it! We’ll move onto NLP. If there’s anything that people want me to cover specifically for tabular in this last lecture, please let me know and we’ll include it in if possible! Thanks!

More on SHAP values: https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d

foobar8675 · March 25, 2020, 11:16pm

falling really far behind @muellerzr back to the Adults notebook, one thing that is still confusing is in the Normalize section, since data is scaled between 0 to 1, do you know why there are negative age/fnlwgt values?

muellerzr · March 25, 2020, 11:18pm

If I mentioned that in the video than I’m wrong. It’s scaled from -1 to +1. We can see this in the encodes function:

def setups(self, dsets): self.means,self.stds = dsets.conts.mean(),dsets.conts.std(ddof=0)+1e-7
    def encodes(self, to): to.conts = (to.conts-self.means) / self.stds <- HERE
    def decodes(self, to): to.conts = (to.conts*self.stds ) + self.means

So if we have a particular x that is less than the mean then we can very easily get a negative value

muellerzr · March 25, 2020, 11:28pm

I’ve made this adjustment in the notebook. Thanks for pointing this out!

hahmed988 · March 26, 2020, 10:09am

Is there a way to extract the learned embedding vectors post training a tabular_learner. I would like to use it for training models from sklearn. The demo of ensembling by @muellerzr greatly helps. But xs extracted from dataloaders are still just the label encoded version of the categorical columns…

AjayStark · March 26, 2020, 10:11am

Thanks, understood

muellerzr · March 26, 2020, 4:26pm

I’ll try to do a demo of that next week. Great question!

hahmed988 · March 26, 2020, 4:30pm

from fastai2.metrics import *
to = TabularPandas(df, procs, cat_names, dep_var, y_block=RegressionBlock(),
splits=splits)
dls = to.dataloaders(bs=32)
learn = tabular_learner(dls, layers=[10,10], metrics= [msle],
loss_func=MSELossFlat(), n_out = 1)

When no cont_names is passed while creating dataloaders I get below error. I have all categorical columns. Also passing is msle as in above results in an error.

RuntimeError: The size of tensor a (32) must match the size of tensor b (0) at non-singleton dimension 0

muellerzr · March 26, 2020, 4:37pm

First try specifying the continues variables as nothing. IE cont_names = []

Edit: I think the issue lies in you need to specify the number of outputs. That was the right track.