Problem with multi-class structured data learner and loss function

Hi all.
I’m struggling with this issue for quite a few hours already and would be glad to get some hints from the experts here :slight_smile:

I’m trying to fit a model for a dataset containing ~45K rows with a single multi-class label.
I have 10 optional classes for the label ranging from 1 to 100 (not continuous obviously). ^(see comment below)
I guess i’m doing something silly, but i’m getting exceptions from ClassNLLCriterion.cu:

ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *,
long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0]
Assertion t >= 0 && t < n_classes failed.

It happens when line 45 of model.py is being executed: loss.backward().

Probably it has something to do with the target (y) shape of the loss function?

My code looks like this:

y = df.label.apply(lambda l: int(float(l))) # labels are originally decimal, shape of y is: (49513,)
df.drop('label', axis=1, inplace=True) # shape of df is: (49513, 2298)
val_idx = get_cv_idxs(len(df), val_pct=0.1)
md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y.values, cat_flds=[], bs=128, is_reg=False) 
# I have no categorical variables that's why cat_flds=[]
m = md.get_learner(emb_szs=[], n_cont=len(df.columns), emb_drop=0.04, out_sz=1, szs= 
[1000,500], drops=[0.001,0.01])

Then I’m getting the above exception (ClassNLLCriterion) followed by THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCTensorMath.cu line=15 error=59 : device-side assert triggered.

^ I also tried encoding my labels to numbers ranging from 0 to 9 but it didn’t help.

Do you have any idea what am I doing wrong?
Thanks a lot!

I got it to work by changing the out_sz to 10 (which is my number of classes).
I confused this variable with my wanted final output (1 label) but that should not be the output of the last activation function (log_softmax).
If I understand correctly the reason is that the nll_loss function must get a (bs X num_of_classes) shape input, with the log probability as the value for each column.
The target (y) should still be (bs X 1) as it expects to get there the index of the correct class.

Hope that helps someone :slight_smile:

2 Likes

Yes! Thanks so much. This helped immensely. If I set out_sz to the number of classes for non-regression problems, it runs fine.

But, then I get 4 values when I run model.predict(), need to figure out how to handle that…

Just use np.argmax to get the index of the highest probability. Then, if you encoded your labels to indices, translate it to your label.

Hey Rony, thanks for your example.
1.
Did you change the loss function to self.crit = F.nll_loss ?
2.
Did you add a softmax layer in the end?

Do you mind share how to implement these two steps?

Thank you!