Hi all.
I’m struggling with this issue for quite a few hours already and would be glad to get some hints from the experts here
I’m trying to fit a model for a dataset containing ~45K rows with a single multi-class label.
I have 10 optional classes for the label ranging from 1 to 100 (not continuous obviously). ^(see comment below)
I guess i’m doing something silly, but i’m getting exceptions from ClassNLLCriterion.cu:
ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *,
long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0]
Assertiont >= 0 && t < n_classes
failed.
It happens when line 45 of model.py is being executed: loss.backward()
.
Probably it has something to do with the target (y) shape of the loss function?
My code looks like this:
y = df.label.apply(lambda l: int(float(l))) # labels are originally decimal, shape of y is: (49513,)
df.drop('label', axis=1, inplace=True) # shape of df is: (49513, 2298)
val_idx = get_cv_idxs(len(df), val_pct=0.1)
md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y.values, cat_flds=[], bs=128, is_reg=False)
# I have no categorical variables that's why cat_flds=[]
m = md.get_learner(emb_szs=[], n_cont=len(df.columns), emb_drop=0.04, out_sz=1, szs=
[1000,500], drops=[0.001,0.01])
Then I’m getting the above exception (ClassNLLCriterion) followed by THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCTensorMath.cu line=15 error=59 : device-side assert triggered
.
^ I also tried encoding my labels to numbers ranging from 0 to 9 but it didn’t help.
Do you have any idea what am I doing wrong?
Thanks a lot!