Structured Learner

KarlH · June 15, 2018, 3:57pm

You’re getting a keyword error when ColumnarModelData tries to access columns from your categorical variables list. Earlier you converted those columns to one hot variables with get_dummies, so the categorical columns no longer exist. Instead of using one hot encoding and the sklearn encoder to manage your dataframe, use proc_df from the fastai library.

shriram · June 16, 2018, 6:44am

Thank you for the input. I removed the dummies and tried with proc_df and still seem to be getting the same error. Could you please elaborate what you meant so that I can have a better understanding. you can find the updated version of my code on the link below, in airbnbupdate.

fmobrj75 · June 21, 2018, 9:05pm

Hi, Josh. I am having the same issue. I tried everything and nothing worked for me. Have you manage to solve it?

MicPie · June 23, 2018, 6:21pm

Today, I tried the rossman approach to another dataset and I’m still getting stuck at the same point (lr_find) with the same cryptic error message.

Therefore I looked closely at the data preparation. Maybe I’m missing a step which is not directly located in the data preprocessing part in the Rossmann notebook? My data preprocessing is also very similar to the one from Kaggle: Home Credit Competition (https://www.kaggle.com/davidsalazarv95/fast-ai-pytorch-starter-version-two/notebook), so I’m really wondering where the error could be located…

Here are my steps:
1.) set type of categorical variables:
for v in cat_vars: df_train[v] = df_train[v].astype('category').cat.as_ordered()
2.) run ‘proc_df’:
df, y, nas, mapper = proc_df(df_train, y_fld='target', do_scale=True, skip_flds=['ID'])
3.) generate log values of y for model data:
yl = np.log(y)

I was trying this out on my paperspace machine with a fresh conda update --all and gut pull.

I was also running the same notebook on my local machine without GPU and got a similar error:

~/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/thnn/sparse.py in forward(cls, ctx, indices, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
     55 
     56         if indices.dim() == 1:
---> 57             output = torch.index_select(weight, 0, indices)
     58         else:
     59             output = torch.index_select(weight, 0, indices.view(-1))

RuntimeError: index out of range at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:277

Interestingly, without GPU I can get additional information with the ‘args’ command in debug mode:

cls = <class 'torch.nn._functions.thnn.sparse.Embedding'>
ctx = <torch.autograd.function.EmbeddingBackward object at 0x1c291c5828>
indices = 
 1
(...in total 128x 1...)
 1
[torch.LongTensor of size 128]
weight = 
1.00000e-02 *
 -8.4851
[torch.FloatTensor of size 1x1]

padding_idx = -1
max_norm = None
norm_type = 2
scale_grad_by_freq = False
sparse = False

Could it be the negative padding_idx? Unfortunately, it seems to get set outside of my code…

Am I missing a obvious step in the data preprocessing?

davidsalazarvergara · June 23, 2018, 6:54pm

Hi!

If my memory serves me well, I did have that problem when trying to build my kernel. The problem was that the model expected something different than the Data Loader was giving it. In my case, it was that I hadn’t done the exact same pre-processing for my validation set as I was doing with my training set.

In step (1), have you tried calling apply_cats(df_valid, df_train) after the for loop?

I suggest you try to ‘debug’ your DataLoader and check whether is yielding the data that you expect it to.

MicPie · June 24, 2018, 7:12am

Hi @davidsalazarvergara

thank you for your reply (and you excellent kaggle kernel)!

I was trying your approach with separated train and val df and ‘ColumnarModelData.from_data_frameS’ and then switched to a single train df with ‘ColumnarModelData.from_data_frame’. However, I got the error with both setups.

I looked into the dataloader data, but this looks fine incl. dimensions (output of vars(md.val_dl.dataset)):

{'cats': array([[1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1],
        ..., 
        [1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1]]),
 'conts': array([[-0.03765, -0.04689, ..., -0.1004 ,  0.70786],
        [-0.03765, -0.04689, ..., -0.1004 , -0.17852],
        ..., 
        [-0.03765, -0.04689, ..., -0.1004 , -0.16079],
        [-0.03765, -0.04689, ..., -0.1004 , -0.17852]], dtype=float32),
 'y': array([[ 13.79429],
        [ 11.51293],
        ..., 
        [ 10.77896],
        [ 16.1181 ]]),
 'is_reg': True,
 'is_multi': False}

I guess the 1s in cats array are a kind of placeholder for the embedding matrix?
For inspecting the data loader I used '‘vars(object)’ at several levels to get the objects. Maybe, there is a better way?

This is my code to generate the data loader and use it:

It is based on the rossmann notebook with parts from your kaggle kernel mentioned in the previous post. In the end its really not a lot of code, so I’m really wondering where the problem is located.

I don’t set the df index explicitly but this shouldn’t be a problem with a generic index (from 0 to len(df)-1)?

Thank you very much & best regards
Michael

MicPie · June 24, 2018, 11:22am

I also tried to change the dtype of all columns to float32 but this ended in the same error.
I’m now at the end of my wisdom… any hint is highly appreciated.

bny6613 · June 28, 2018, 4:33pm

(c, len(df_train[c].unique()+1))

Looks like a mistake. You are adding one before taking the length

MicPie · July 3, 2018, 5:01am

Hello @bny6613
thank you very much, this is almost embarrassing but this was the problem.
Now I can start the learning rate finder.
Thank you very much for your help!

datasciencegeek2018 · July 5, 2018, 3:35pm

Using Structured Learner for classification, is there a final resolution to how best to modify the rossman code to make it work? Thanks in advance for your guidance

MicPie · July 21, 2018, 7:18pm

Hello @datasciencegeek2018,

did you found a good way?

I’m currently playing around with data that has as y/target values ranging from 1 to 4.
Strangely, I can get the learner to run with out_sz=8 and not with 4 (leads to a cuda runtime error).
It seems that others got it working with the class sizes = out_sz, see: Problem with multi-class structured data learner and loss function.

I will/have to dig depper but I’m happy for suggestions.

whamp · July 21, 2018, 9:19pm

sorry, my focus has been on structured data for regression. Would be curious to know the same though.

MicPie · July 22, 2018, 10:54am

I found my error after debugging for several hours!

Short story:
According to the pytorch docs for the NLL loss function you have to input the y classes like this:
(N) where each value is 0≤targets[i]≤C−1
So I recoded my classes ranging from 1 - 4 to 0 - 3 and set out_sz=4 which then worked.

Long story:
The output I got with y classes ranging from 1 - 4 and out_sz=8 was a softmax over 8 values summing up to 1.

I checked the MixedInputModel where the last (linear) layer is defined (self.outp = nn.Linear(szs[-1], out_sz)), but this looked fine.

Then I could narrow it down to the pytorch NLL loss function which was getting something which it could not handle properly (classes ranging from 1 - 4) and then interpreted that as 8 classes.

Now I can train my NN but after some time it seems to get stuck in a minima and is predicting the same class for all data points. After debugging is before debugging…

Hope this helps somebody & best regards
Michael

P.S.: I found this for getting more meaningful information during debugging of cuda errors (from http://lernapparat.de/debug-device-assert/):

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

(Coincidentally, this article is showing a similar problem with the NLL loss.)

datasciencegeek2018 · July 23, 2018, 11:41am

the fastai library now incorporates this since there is a parameter is_reg which can be set to False for classification and also modify out_sz to number of classes

msmedes · July 23, 2018, 12:40pm

how are you preprocessing the y values? Just converting to float? Making them categorical? One hot encoding? I’m getting a UserWarning: Using a target size (torch.Size([128])) that is different to the input size (torch.Size([128, 4])) is deprecated. Please ensure they have the same size. error when I try to fit, and I’m not exactly sure how to encode the targets so that they are in the correct shape for the input.

MicPie · July 23, 2018, 4:12pm

Hello,

the classes need to be encoded to categorical from 0 to n-1 (for n = 4 classes this results in 0 to 3) in a 1-dim np.array (at least this is how I got it working).
So you have to translate the one-hot-encoded to that format.

I hope this works for you & best regards
Michael

msmedes · July 23, 2018, 4:31pm

Thanks! I’m not really sure what I’m doing wrong, it still says the dimensions are [128] instead of [128,4].
My steps are:

Load dataframe and do feature engineering (target labels are just 1,2,3,4 representing each class)
train['Target'] = train['Target'] - 1
train['Target'] = train['Target'].astype('category').cat.as_ordered()
df, y, nas, mapper = proc_df(train, 'Target', do_scale=True)
y = np.asarray(y)
md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=cat_vars, bs=128, is_reg=False, is_multi=True)
Calculate cat embedding sizes
m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars), 0.04, 4, [1000,500], [0.001,0.01], y_range = None)
Followed by a whole bunch of nothing when I try to run it (actually just errors). As with all of my misadventures in DL so far, I’m sure there’s some simple boneheaded mistake I’m making.

pierreguillou · July 28, 2018, 9:24pm

I’m using get_learner to make a binary classifier (0 or 1) on structured time-series data.

The possibility given by MixedInputModel in column_data.py is to have 2 outputs using F.log_softmax.
In this case, the cost function is F.nll_loss (negative log likelihood).

For that is_reg=False and is_multi=False in my data object md as following :

md = ColumnarModelData.from_data_frame(PATH, val_idx, train, y, cat_flds=cat_vars, 
                                       bs=128, is_reg=False, is_multi=False, 
                                       test_df=test)

Then, my learner model m is as following :

m = md.get_learner(emb_szs, n_cont=n_cont, 
                   emb_drop=0.04, out_sz=2, szs=[1000, 500], drops=[0.001,0.01], 
                   use_bn=True, y_range=None)

It works well but I would like to try the classic classifier model with the following parameters :

1 output using a sigmoid (F.sigmoid)
and the Binary Cross Entropy Loss (F.binary_cross_entropy) as cost function.

What are the modifications to do in md and m ?

fredguth · July 29, 2018, 3:20am

@pierreguillou, I haven’t tried myself, but I believe…

just need to find the layer in the model you want to change and assign it to the function you want.
m.crit = F.binary_cross_entropy.

(edited)

pierreguillou · July 29, 2018, 11:46am

Thanks @fredguth but I tried and does not work.
One way would point to the following code :

md = ColumnarModelData.from_data_frame(PATH, val_idx, train, y, cat_flds=cat_vars, 
                                       bs=128, is_reg=False, is_multi=True, 
                                       test_df=test)

m = md.get_learner(emb_szs, n_cont=n_cont, 
                   emb_drop=0.04, out_sz=1, szs=[1000, 500], drops=[0.001,0.01], 
                   use_bn=True, y_range=None)

Explications :

is_multi=True turns the cost function to F.binary_cross_entropy.

In column_data.py, it does the job of
m.crit = F.binary_cross_entropy thanks to def _get_crit(self, data) as following :

def _get_crit(self, data): return F.mse_loss if data.is_reg else F.binary_cross_entropy if data.is_multi else F.nll_loss

is_reg=False and is_multi=True make the use of the sigmoid function in place of the softmax one thanks to def forward(self, x_cat, x_cont) in in column_data.py :

if not self.is_reg:
    if self.is_multi:
          x = F.sigmoid(x)
    else:
          x = F.log_softmax(x)

out_sz=1 tells the model m that there is only one output.

With these parameters, when I run m.fit(lr, 1), I got the following error :

@kcturgutlu, @thiago, @johnri99, @ronlut, @stas : do you have any advice as well in order to make a binary classifier (0 or 1) on structured time-series data by using get_learner, F.Sigmoid() and F.binary_cross_entropy() ?