A walk with fastai2 - Tabular - Study Group and Online Lectures Megathread

muellerzr · April 8, 2020, 9:24am

If you use the raw weights and setup the model the exact same way yes, but most of the time learn.save() keeps the optimizer state to in which case the answer would be no.

AjayStark · April 8, 2020, 2:41pm

@muellerzr kindly help me out when you are free.

Thanks

muellerzr · April 8, 2020, 6:02pm

Just so everyone knows it’s a thing here’s the link to part 3, Text which will begin tonight:

waydegg · April 9, 2020, 9:31am

Anyone know if you’re able to get the index values of a TabularPandas object? Doesn’t seem like you can do the usual .index like you would on a DataFrame.

Edit: And I think I just answered my own question. Doing .items give you a Pandas DataFrame version of our TabularPandas object, which you can then call .index. There probably should be an index attribute attached by default so you don’t have to do this.

waydegg · April 12, 2020, 8:19am

Is there a general rule of thumb when choosing how many unique categorical values you should have per column? I have a dataset with many categorical features that each range from having 10 to several hundred unique values and I’m obviously trying to cut down on the latter size. I have around 19k rows in this dataset.

Some approaches I’m taking for “cutting down” the amount of unique values per column is:

Creating new labels that group others together
Re-labeling examples that don’t occur very often simply as “other”
Dropping rows completely if they have multiple “rare” values for several columns

I’m not sure which/if any of these methods are useful, though I have a feeling that having a couple hundred unique values per column certainly isn’t helping my model accuracy…

chkchk12 · April 12, 2020, 8:44pm

Hello everyone, I’ve just gotten started watching the lectures. I’m trying to run the 02_Regression_and_Permutation_Importance notebook in Google Colab. I kept everything as is. However, I get the following error. Anyone know what is going on?

muellerzr · April 12, 2020, 8:45pm

Yes I need to update that notebook as those should go into a tabular_config

Alpha · April 14, 2020, 11:28pm

You can do it like that :
config=tabular_config(ps=[0.001,0.01], embed_p=0.04)

learn = tabular_learner(dls, layers=[1000,500], config=config, y_range=y_range, metrics=rmse, loss_func=MSELossFlat())

vrodriguezf · April 17, 2020, 11:25am

Hi! Thank you for this amazing study group!

Does anybody know how to define which is the positive class in a CategoryBlock? I am getting an encoding where 0 is the positive class, which is a bit against of the standard of binary classification.

Thanks!

DebabrataRoy · April 20, 2020, 7:20pm

Thanks Mueller for the amazing videos. Can someone post their work on tabular datasets. We can get more examples and more datasets. Please share your work on tabular datasets

waydegg · April 20, 2020, 9:11pm

Is it possible to get the encoded data from a dls.test_dl? I’m trying to load and process a test dataset and then get the encoded values for all the columns so I can use that data in non-fastai models (XGBoost, RF, etc.).

When I originally create my DataLoader for training, I can call to.train.xs since I used TabularPandas to feed in my original training set into my DataLoader. Is there a way I can access the transforms, apply them, and view the applied results from a dls.test_dl?

muellerzr · April 20, 2020, 9:16pm

Just call dl.xs I believe (since it shouldn’t have a train or valid separation)

waydegg · April 20, 2020, 9:20pm

dl.xs just shows the non-encoded, original values. On an unrelated note, only per index and not the entire dataset.

muellerzr · April 20, 2020, 9:20pm

After making the dl call dl.process(). That should encode them all

(Also dl.dataset for the dataset)

waydegg · April 20, 2020, 9:29pm

There is no .process() for a DataLoader generated from your ‘original’ dl used in training (as in a dl created from calling dls.test_dl). You can do that for your training dl though. Even after calling .process() and then creating a new .test_dl using my test data, the dataset is still unencoded (which makes sense, but I just wanted to mention that).

fastai2 version: 0.0.17
fastcore version: 0.1.17

vrodriguezf · May 12, 2020, 11:34am

Hi!

Do any of these notebooks contain an example of regression with multiple dependent variables? I am not sure exactly how to proceed.

Thanks!

muellerzr · May 12, 2020, 11:37am

No, I’m afraid they don’t out of the box. TabularPandas doesn’t like that very much. My best recommendation would be using a NumPy DataLoader for tabular instead and working with it. See my article here @vrodriguezf https://muellerzr.github.io/fastblog/2020/04/22/TabularNumpy.html

vrodriguezf · May 12, 2020, 3:26pm

Thank you, I’ll have a look! However, I am a bit confused. I just created a TabularDataLoaders with a list of two variables as y_names, and y_block=RegressionBlock(), and things seem to be working with MSE as loss function.

I did not even need to adjuts the n_out argument of RegressionBlock, which I was thinking at the beginning that could be used to define the number of output activations that you wanted in the regression.

muellerzr · May 12, 2020, 3:29pm

You found the one exception if it’s two regressions it will work. However classification with regression or two classifications it will not

vrodriguezf · May 12, 2020, 3:36pm

Oh, I understand! how lucky I am

One extra question: I am not sure if I remember correctly, but: did fastai1 provide a way to automatically log transform the dependent variables? Is that provided as a ready-to-use Procs in fastai2?

EDIT: Sorry I just saw that Jeremy does it manually in one of the fastai2 lessons