Structured Learner

Hi @davidsalazarvergara

thank you for your reply (and you excellent kaggle kernel)! :slight_smile:

I was trying your approach with separated train and val df and ‘ColumnarModelData.from_data_frameS’ and then switched to a single train df with ‘ColumnarModelData.from_data_frame’. However, I got the error with both setups.

I looked into the dataloader data, but this looks fine incl. dimensions (output of vars(md.val_dl.dataset)):

{'cats': array([[1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1],
        ..., 
        [1, 1, ..., 1, 1],
        [1, 1, ..., 1, 1]]),
 'conts': array([[-0.03765, -0.04689, ..., -0.1004 ,  0.70786],
        [-0.03765, -0.04689, ..., -0.1004 , -0.17852],
        ..., 
        [-0.03765, -0.04689, ..., -0.1004 , -0.16079],
        [-0.03765, -0.04689, ..., -0.1004 , -0.17852]], dtype=float32),
 'y': array([[ 13.79429],
        [ 11.51293],
        ..., 
        [ 10.77896],
        [ 16.1181 ]]),
 'is_reg': True,
 'is_multi': False}

I guess the 1s in cats array are a kind of placeholder for the embedding matrix?
For inspecting the data loader I used '‘vars(object)’ at several levels to get the objects. Maybe, there is a better way?

This is my code to generate the data loader and use it:


It is based on the rossmann notebook with parts from your kaggle kernel mentioned in the previous post. In the end its really not a lot of code, so I’m really wondering where the problem is located.

I don’t set the df index explicitly but this shouldn’t be a problem with a generic index (from 0 to len(df)-1)?

Thank you very much & best regards
Michael

I also tried to change the dtype of all columns to float32 but this ended in the same error.
I’m now at the end of my wisdom… any hint is highly appreciated.

(c, len(df_train[c].unique()+1)) 

Looks like a mistake. You are adding one before taking the length

1 Like

Hello @bny6613
thank you very much, this is almost embarrassing but this was the problem.
Now I can start the learning rate finder.
Thank you very much for your help! :smiley:

Using Structured Learner for classification, is there a final resolution to how best to modify the rossman code to make it work? Thanks in advance for your guidance

1 Like

Hello @datasciencegeek2018,

did you found a good way?

I’m currently playing around with data that has as y/target values ranging from 1 to 4.
Strangely, I can get the learner to run with out_sz=8 and not with 4 (leads to a cuda runtime error).
It seems that others got it working with the class sizes = out_sz, see: Problem with multi-class structured data learner and loss function.

I will/have to dig depper but I’m happy for suggestions. :slight_smile:

sorry, my focus has been on structured data for regression. Would be curious to know the same though.

I found my error after debugging for several hours! :slight_smile:

Short story:
According to the pytorch docs for the NLL loss function you have to input the y classes like this:
(N) where each value is 0≤targets[i]≤C−1
So I recoded my classes ranging from 1 - 4 to 0 - 3 and set out_sz=4 which then worked.


Long story:
The output I got with y classes ranging from 1 - 4 and out_sz=8 was a softmax over 8 values summing up to 1.

I checked the MixedInputModel where the last (linear) layer is defined (self.outp = nn.Linear(szs[-1], out_sz)), but this looked fine.

Then I could narrow it down to the pytorch NLL loss function which was getting something which it could not handle properly (classes ranging from 1 - 4) and then interpreted that as 8 classes.


Now I can train my NN but after some time it seems to get stuck in a minima and is predicting the same class for all data points. After debugging is before debugging… :wink:

Hope this helps somebody & best regards
Michael

P.S.: I found this for getting more meaningful information during debugging of cuda errors (from http://lernapparat.de/debug-device-assert/):

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

(Coincidentally, this article is showing a similar problem with the NLL loss.)

2 Likes

the fastai library now incorporates this since there is a parameter is_reg which can be set to False for classification and also modify out_sz to number of classes

how are you preprocessing the y values? Just converting to float? Making them categorical? One hot encoding? I’m getting a UserWarning: Using a target size (torch.Size([128])) that is different to the input size (torch.Size([128, 4])) is deprecated. Please ensure they have the same size. error when I try to fit, and I’m not exactly sure how to encode the targets so that they are in the correct shape for the input.

Hello,

the classes need to be encoded to categorical from 0 to n-1 (for n = 4 classes this results in 0 to 3) in a 1-dim np.array (at least this is how I got it working).
So you have to translate the one-hot-encoded to that format.

I hope this works for you & best regards
Michael

Thanks! I’m not really sure what I’m doing wrong, it still says the dimensions are [128] instead of [128,4].
My steps are:

  1. Load dataframe and do feature engineering (target labels are just 1,2,3,4 representing each class)
  2. train['Target'] = train['Target'] - 1
    train['Target'] = train['Target'].astype('category').cat.as_ordered()
  3. df, y, nas, mapper = proc_df(train, 'Target', do_scale=True)
  4. y = np.asarray(y)
  5. md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=cat_vars, bs=128, is_reg=False, is_multi=True)
  6. Calculate cat embedding sizes
  7. m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars), 0.04, 4, [1000,500], [0.001,0.01], y_range = None)
    Followed by a whole bunch of nothing when I try to run it (actually just errors). As with all of my misadventures in DL so far, I’m sure there’s some simple boneheaded mistake I’m making.

I’m using get_learner to make a binary classifier (0 or 1) on structured time-series data.

The possibility given by MixedInputModel in column_data.py is to have 2 outputs using F.log_softmax.
In this case, the cost function is F.nll_loss (negative log likelihood).

For that is_reg=False and is_multi=False in my data object md as following :

md = ColumnarModelData.from_data_frame(PATH, val_idx, train, y, cat_flds=cat_vars, 
                                       bs=128, is_reg=False, is_multi=False, 
                                       test_df=test)

Then, my learner model m is as following :

m = md.get_learner(emb_szs, n_cont=n_cont, 
                   emb_drop=0.04, out_sz=2, szs=[1000, 500], drops=[0.001,0.01], 
                   use_bn=True, y_range=None)

It works well but I would like to try the classic classifier model with the following parameters :

  • 1 output using a sigmoid (F.sigmoid)
  • and the Binary Cross Entropy Loss (F.binary_cross_entropy) as cost function.

What are the modifications to do in md and m ?

@pierreguillou, I haven’t tried myself, but I believe…

  1. just need to find the layer in the model you want to change and assign it to the function you want.
  2. m.crit = F.binary_cross_entropy.

(edited)

Thanks @fredguth but I tried and does not work.
One way would point to the following code :

md = ColumnarModelData.from_data_frame(PATH, val_idx, train, y, cat_flds=cat_vars, 
                                       bs=128, is_reg=False, is_multi=True, 
                                       test_df=test)

m = md.get_learner(emb_szs, n_cont=n_cont, 
                   emb_drop=0.04, out_sz=1, szs=[1000, 500], drops=[0.001,0.01], 
                   use_bn=True, y_range=None)

Explications :

  • is_multi=True turns the cost function to F.binary_cross_entropy.

In column_data.py, it does the job of
m.crit = F.binary_cross_entropy thanks to def _get_crit(self, data) as following :

def _get_crit(self, data): return F.mse_loss if data.is_reg else F.binary_cross_entropy if data.is_multi else F.nll_loss

  • is_reg=False and is_multi=True make the use of the sigmoid function in place of the softmax one thanks to def forward(self, x_cat, x_cont) in in column_data.py :
if not self.is_reg:
    if self.is_multi:
          x = F.sigmoid(x)
    else:
          x = F.log_softmax(x)
  • out_sz=1 tells the model m that there is only one output.

With these parameters, when I run m.fit(lr, 1), I got the following error :

@kcturgutlu, @thiago, @johnri99, @ronlut, @stas : do you have any advice as well in order to make a binary classifier (0 or 1) on structured time-series data by using get_learner, F.Sigmoid() and F.binary_cross_entropy() ?

You need out_sz=2 as there are 2 output probabilities (negative, positive).
In addition, why do you want binary cross entropy and not just pass is_multi=false? Then, have softmax decide whether it’s positive or negative?

what Rony said. If you update your fastai repository it now asserts sz>1 when is_reg=False, to prevent this kind of situations. https://github.com/fastai/fastai/pull/654

Hi @ronlut,

I’m doing one-label binary classification with is_reg=false, is_multi=false, out_sz=2, softmax, nll_loss and it works but my question was : why can’t we use out_sz=1, sigmoid and Binary Cross Entropy (BCE) with Fastai ?

Why do I want to do that ? 2 reasons :

  1. For logistic regression (classification), the usual way is using 1 output with Sigmoid and the Binary Cross Entropy as loss function.
  2. What are the performances and the 2 possibilities ([2 outputs + Softmax + nll_loss] vs [1 output + Sigmoid + BCE]) ? What is the best one ?

Thanks @stas. Your Fastai repository update confirms the [2 outputs + Softmax + nll_loss] solution to create a classifier with fastai for structured data.

However, I’m still wondering why we put on side the [1 output + Sigmoid + Binary Cross Entropy] possibility. The 2 solutions are not identical. Which one has the best performance ?

Change is_multi = True, out_sz =1 and y.astype(np.float32) in ‘from_data_frame’ call.

1 Like