Structured Learner

range(0,4) should work. What’s the size of your embeddings? Make sure that you’re including the 0 (max(range)+1). If your C isn’t 5 in the embedding then that’s likely your issue.

Thanks. This is how I decide my embeddings size:
emb_szs = [(c, min(50, (c+1)//2)) for _,c in cat_sz]

I’m not sure what you mean by “If your C isn’t 5 in the embedding then that’s likely your issue”, since isn’t the c for the embedding size different from the number of classes I’m trying to identify? I thought the C in the embedding size is just a function of how many different categories that specific category had.

The model does run without error if I change the loss to mse_loss and the target to np.float32, which obviously that is not the best way to do classification. But, that does run…

Looking at your example above you are using MultiLabelSoftMarginLoss. From looking at Pytorch documentation this requires one hot encoding of the target, as compared to NLL_Loss, which requires a (N,C) shape. Have you tried with a simple NLL_Loss function, I have no problem getting this to work using the latest version of the column_data.py, which lets you define classification instead of regression, and then uses NLL_Loss. The target can be supplied to the model data as an simple integer array.

Apologies if the example above is out of date, please ignore if that is the case.

Amazing work @kcturgutlu and @johnri99, thanks for sharing your path to success.

What should one do to achieve a multi-label output?
Is setting out_sz should suffice?

Another thing, I’m getting exceptions at ClassNLLCriterion.cu even before trying multi-label, only multiclass:

ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, 
long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] 
Assertion `t >= 0 && t < n_classes` failed.

I’m guessing it has something to do with the output or the loss function?

My code looks like this:

y = df.label.apply(lambda l: int(float(l))) # labels are originally decimal, shape of y is: (49513,)
df.drop('label', axis=1, inplace=True) # shape of df is: (49513, 2298)
val_idx = get_cv_idxs(len(df), val_pct=0.1)
md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y.values, cat_flds=[], bs=128, is_reg=False) 
# I have no categorical variables that's why cat_flds=[]
m = md.get_learner(emb_szs=[], n_cont=len(df.columns), emb_drop=0.04, out_sz=1, szs= 
[1000,500], drops=[0.001,0.01])

Then I’m getting the above exception (ClassNLLCriterion) followed by THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCTensorMath.cu line=15 error=59 : device-side assert triggered.

Do you have any idea what am I doing wrong?
Thanks a lot!

1 Like

Hi Rony,

Out_size = 1 would work with cross entropy but I think NLLL needs output of 2, ie it has one column for false and one for true. Its redundant information when you only need true or false but I use it because its easy to change the number of classes since you don’t need to change anything else. The prediction therefore needs one column per class. This can be confusing since no matter how many classes you have from your prediction, the target value is a single column with long integers between 0 and no_classes-1

You can see how I have used it in the example below (in the Class ClassifyFromAE)

The training, validation, loss and back propagation etc are managed by a class NN_Manage since this was all written prior to the fastai library.

Not sure this will solve your problem but it looks as though it could be part of the problem.

(note - I checked this with a simple example and I think it is correct)

1 Like

Thanks again.
I got it to work about an hour ago exactly by changing out_sz to 10.
So i’m happy we came to the same conclusion :slight_smile:
I wrote an explanation here if someone is interested.

2 Likes

I am facing the same issue. Were you able to figure it out?

I added a multi-label classification ability to the ColumnarModelData if anyone is interested :slight_smile:

5 Likes

This seems like it might be the most appropriate place for a more general question because I think it would mostly apply to problems that involve structured datasets. Does anyone have any thoughts on how to incorporate observation weights into PyTorch? What I mean by observation weights is that one observation might have been observed for a longer period of time than another observation and thus the first observation has more information and thus should inform the training more. For instance, if the goal of the model is to predict whether or not a car accident occurred and we have different observation lengths for each record, I want to inform the model of this fact.

My first guess about how to go about incorporating observation weights is to scale the loss contribution by the weight such that when the loss gets propagated back, the parameters are ‘weight-aware’ in their updates but I really don’t have any idea how to do that in PyTorch without breaking everything. I’ve looked around the internet and I haven’t really seen this question asked or addressed. Any thoughts?

1 Like

@patrick This is perhaps naive, but could you simply make observation_period a feature of each observation, and let the training process decide what influence that should have?

@dangoldner That is not a bad idea at all, but if we know a priori that the probability of an accident scales linearly with observation length, I think it would be better to inform the model of this than require the model to learn it. Also, the solution you propose is not as general. Consider another use case where a dataset has been downsampled across some dimension. For example, every 5th observation that has a response value of ‘0’ is kept and the other four are discarded. We might do this if the original dataset is large and there’s a class imbalance. In order to get the overall average prediction right, we need to inform the model that every record with a response value of ‘0’ is actually representative of five records. The only way I know how to do that is through observation weights.

1 Like

I am able to fix the issue by mapping the categorical variables from 0 to n.

I’ve been working on a multi-class structured learner with embeddings for the categorical data and it seems to train well, with the val_loss steadily dropping, but I am having problems with the predictions. When I call learn.predict, I get an array with the dimensions of test_df x first hidden layer (2048), rather than test_df x out_sz (48). If I drop all the hidden layers, I get test_df x len(input with all the embeddings) (772).

md = ColumnarModelData.from_data_frames(’/tmp’, trn_df, val_df, trn_y.astype(‘int’), val_y.astype(‘int’), cats, 512, is_reg = False, test_df = test_df)
model = MixedInputModel(emb_szs, len(contins), emb_drop=0, out_sz=48, szs=[2048,1024,512], drops=[0.1]).cuda()
bm = BasicModel(model, ‘muticlass_classifier’)
learn = StructuredLearner(md, bm)

Can anybody spot what I am doing wrong?

Running Amazon linux on a p2.

Have a quick and dirty example notebook created a few weeks just to show a working example. I did not actually see @johnri99 notebook(did I miss it?) which I am sure is superior but here is one for viewing. Also, had trouble with the flag, is_reg, thus the nb.

1 Like

Update
I fixed it: Added .cuda()

mixedinputmodel = MixedInputModel(emb_szs, len(df.columns)-len(cat_vars),
                   0.04, 1, [1000,500], [0.001,0.01], y_range=y_range).cuda()

Original:

I’m getting this error as well. Did you find a solution?

image

Based on Rossman data:

mixedinputmodel = MixedInputModel(emb_szs, len(df.columns)-len(cat_vars), 0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)

bm = BasicModel(mixedinputmodel, 'mixedInputRegression')

md = ColumnarModelData.from_data_frame(PATH, val_idx, df, yl.astype(np.float32), cat_flds=cat_vars, bs=128, test_df=df_test)

learn = StructuredLearner(md, bm)

learn.lr_find()

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/_functions/thnn/sparse.py in forward(cls, ctx, indices, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
55
56 if indices.dim() == 1:
—> 57 output = torch.index_select(weight, 0, indices)
58 else:
59 output = torch.index_select(weight, 0, indices.view(-1))

TypeError: torch.index_select received an invalid combination of arguments - got (torch.FloatTensor, int, torch.cuda.LongTensor), but expected (torch.FloatTensor source, int dim, torch.LongTensor index)

2 Likes

Do we have to explicitly convert categorical variables as categories before passing them to our model? This part occupies a lot of memory. I believe we do that to calculate the emb_szs primarily. Instead, we can calculate emb_szs without converting it to categorical data as well. But I am not sure how pytorch will treat these variables if we did not convert them to categorical variables. Has anyone tried this before?

I’ve been reading through this thread and it seems the answer is somehow contained within, however I still have had no luck with getting a binary classification working with a structured data set.

I am trying to identify a rare event (about 1% occurrence) in a time series data set. To make the dataset more balanced I’ve stripped back most of the non-occurrences so that its roughly 50/50 for the event occurring however my training set is now only 70,000 rows.

So my prediction should either be true or false. I’m using is_reg=False

 md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y1, is_reg=False, cat_flds=cat_vars, bs=64)

I initially tried setting my dep var to

 y=[0, 0, 0, 1, 0, 0, 0, 1, 0....]

and out_sz = 1

 m = md.get_learner(emb_szs, n_cont = len(df.columns)-len(cat_vars),
               emb_drop = 0.04, out_sz = 1, szs = [1000, 500], drops = [0.001,0.01], use_bn = True)

but this throws

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorCopy.c:20

I then tried 0ne hot encoding my dep and set out_sz = 2

 y = [[0, 1], [0, 1], [1, 0], [0, 1] ..... ]
 m = md.get_learner(emb_szs, n_cont = len(df.columns)-len(cat_vars),
               emb_drop = 0.04, out_sz = 2, szs = [1000, 500], drops = [0.001,0.01], use_bn = True)

however that throws an error on

 df ,y, nas, mapper = proc_df(model_samp, 'pred_label', do_scale=True)
 AttributeError: Can only use .cat accessor with a 'category' dtype

I’ve tried setting my one hot encodings to both int64 and category types but both throw this same error.

I’m a little stumped on the correct way to set this up. I’m clearly missing something though.

I think you want y = [0, 1, 1, 0, 1] format and out_sz = 2. I also had to do y.astype(‘int’) in ColumnarModelData, so you might see if that helps.

For imbalance, I duplicated the rarer targets:
dfx = df[df[‘RESOLUTION’] < 10]
dfy = df[df[‘REJECTED’] < 10]
frames = [df, dfy, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx]
df = pd.concat(frames)
And shuffled afterwards.

My multiclass target combines rejected and resolution later in the data processing.

Does this example help?

1 Like

Wow, I don’t know how I managed to miss that combination. Thank you. Just to confirm my understanding of how fastai/pytorch is evaluating this model’s loss.

With
is_reg = False
y=[1, 0, 0, 0, 1]
out_sz=2

The last layer of the model is outputting a rank 1 tensor shape [2], one for class 1 and one for class 2. Fast.ai (because is_reg=False) is setting the loss function to log_softmax

//column_data.py
if not self.is_reg:
  if self.is_multi:
    x = F.sigmoid(x)
  else:
    x = F.log_softmax(x)

which is taking those two values, and scaling them to [0, 1] and adding to 1 and then taking the Log

 e.g softmax([3.4, 9.8]) -> [0.2575, 0.7424]
 log([3.4, 9.8]) -> [-1.3567, -0.2973]

So if the target value was y = 1 then in the above example is it treating 1 is the i=1 index into an array of 2 classes? So does it then compare -0.2973 to 1?? or does it take the log of 1 (the target) and compare -0.2973 to ln(1) which would give a loss of 0.2973?

I just want to make sure I really understand what the loss calculation is actually telling me. Given the rarity of the event I’m trying to predict I want to make sure that I maintain a low false positive rate and a high accuracy on the true positive probability even at the sacrifice of increasing the false negative rate. i.e I’d rather miss 9/10 events so long as the one time it does predict an event I can have assurance that it is highly likely to occur.