Structured Learner

What change was made to ColumnarDataset?
I see the one commented line but it looks the same as fastai version except your y input is a df and fastai uses np.array.
And what change to make it multiple classification?
I have similar setup for different dataset and have this error:

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THNN/generic/ClassNLLCriterion.c:22

This is running learn.fit
Which seems to be some dimension error in the target(y) somewhere. The dimension was BatchSize x 1 with each an int for the category.

I also wanted to use the Categorical Data models for a classification rather. I got it to work by doing the following:

1 Make sure that the dependent variable is converted to integer
2 change the loss function in the structured learner to self.crit = F.nll_loss
3 Change the last layer of the mixed model to be x = F.log_softmax(x)

The above works with multi-class problems and hence I prefer it to binary cross entropy. It also avoid you having to one hot encode the dep var.

I would like to make the the ColumularDataset, ColumularModelData, Structured Learner and MixedInputModel all able to accept either type of input but havenā€™t got around to that yet.

4 Likes

Iā€™m also running into this problem, however, when I run with @johnri99 's changes as above, Iā€™m getting:
RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16 as gambit50 was also.

Iā€™ve tried a few different loss functions (incl CrossEntropyLoss, MultiLabelSoftMarginLoss) without much success. I keep getting type mismatch errors or these weird RuntimeError: cuda runtime error (59) errorsā€¦

Here are my classes:

class StructuredLearner(Learner):
def __init__(self, data, models, **kwargs):
    super().__init__(data, models, **kwargs)
    if self.models.model.classify:
        self.crit = nn.MultiLabelSoftMarginLoss
    else: self.crit = nn.MultiLabelSoftMarginLoss


class MixedInputModel(nn.Module):
    def __init__(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range=None, use_bn=False, classify=True):
        super().__init__() ## inherit from nn.Module parent class
        self.embs = nn.ModuleList([nn.Embedding(m, d) for m, d in emb_szs]) ## construct embeddings
        for emb in self.embs: emb_init(emb) ## initialize embedding weights
        n_emb = sum(e.embedding_dim for e in self.embs) ## get embedding dimension needed for 1st layer
        szs = [n_emb+n_cont] + szs ## add input layer to szs
        self.lins = nn.ModuleList([
            nn.Linear(szs[i], szs[i+1]) for i in range(len(szs)-1)]) ## create linear layers input, l1 -> l1, l2 ...
        self.bns = nn.ModuleList([
            nn.BatchNorm1d(sz) for sz in szs[1:]]) ## batchnormalization for hidden layers activations
        for o in self.lins: kaiming_normal(o.weight.data) ## init weights with kaiming normalization
        self.outp = nn.Linear(szs[-1], out_sz) ## create linear from last hidden layer to output
        kaiming_normal(self.outp.weight.data) ## do kaiming initialization
        
        self.emb_drop = nn.Dropout(emb_drop) ## embedding dropout, will zero out weights of embeddings
        self.drops = nn.ModuleList([nn.Dropout(drop) for drop in drops]) ## fc layer dropout
        self.bn = nn.BatchNorm1d(n_cont) # bacthnorm for continous data
        self.use_bn,self.y_range = use_bn,y_range 
        self.classify = classify
        
    def forward(self, x_cat, x_cont):
        x = [emb(x_cat[:, i]) for i, emb in enumerate(self.embs)] # takes necessary emb vectors 
        x = torch.cat(x, 1) ## concatenate along axis = 1 (columns - side by side) # this is our input from cats
        x = self.emb_drop(x) ## apply dropout to elements of embedding tensor
        x2 = self.bn(x_cont) ## apply batchnorm to continous variables
        x = torch.cat([x, x2], 1) ## concatenate cats and conts for final input
        for l, d, b in zip(self.lins, self.drops, self.bns):
            x = F.relu(l(x)) ## dotprod + non-linearity
            if self.use_bn: x = b(x) ## apply batchnorm activations
            x = d(x) 
        x = self.outp(x) 
        return x 

Adapted from: https://github.com/groverpr/deep-learning/blob/master/taxi/taxi3.ipynb

That error message usually occurs when you have one hot encoded the target, which you donā€™t need to do with nlll_loss.

Will have a more thorough look but that would be my first thought

1 Like

How are you tackling the imbalance in the dataset?

I was just about to fork the library to incorporate the ability to deal with categorical data when I found that whilst I had been thinking about it Vinod Kumar Reddy Gandra has just actually done the same. Nice work Vinod, slightly disappointed as it would have been a good chance to work thorough contributing to an open source project but Iā€™m sure there will be other chances.

It looks as though there is now a parameter to be set when instantiating the ColumnModelDat to tell the system what type of analysis is needed. The parameter ā€˜is_regā€™ should be set to True for regression and False for catagorical.

5 Likes

I tried passing ā€˜yā€™ as both shape (N, ) and (N, 1) where N is the number of samples and each value is an integer in range (0,4) and range (1,5) [5 classes in my data]. And I get the same error in each situation. What am I missing?

range(0,4) should work. Whatā€™s the size of your embeddings? Make sure that youā€™re including the 0 (max(range)+1). If your C isnā€™t 5 in the embedding then thatā€™s likely your issue.

Thanks. This is how I decide my embeddings size:
emb_szs = [(c, min(50, (c+1)//2)) for _,c in cat_sz]

Iā€™m not sure what you mean by ā€œIf your C isnā€™t 5 in the embedding then thatā€™s likely your issueā€, since isnā€™t the c for the embedding size different from the number of classes Iā€™m trying to identify? I thought the C in the embedding size is just a function of how many different categories that specific category had.

The model does run without error if I change the loss to mse_loss and the target to np.float32, which obviously that is not the best way to do classification. But, that does runā€¦

Looking at your example above you are using MultiLabelSoftMarginLoss. From looking at Pytorch documentation this requires one hot encoding of the target, as compared to NLL_Loss, which requires a (N,C) shape. Have you tried with a simple NLL_Loss function, I have no problem getting this to work using the latest version of the column_data.py, which lets you define classification instead of regression, and then uses NLL_Loss. The target can be supplied to the model data as an simple integer array.

Apologies if the example above is out of date, please ignore if that is the case.

Amazing work @kcturgutlu and @johnri99, thanks for sharing your path to success.

What should one do to achieve a multi-label output?
Is setting out_sz should suffice?

Another thing, Iā€™m getting exceptions at ClassNLLCriterion.cu even before trying multi-label, only multiclass:

ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, 
long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] 
Assertion `t >= 0 && t < n_classes` failed.

Iā€™m guessing it has something to do with the output or the loss function?

My code looks like this:

y = df.label.apply(lambda l: int(float(l))) # labels are originally decimal, shape of y is: (49513,)
df.drop('label', axis=1, inplace=True) # shape of df is: (49513, 2298)
val_idx = get_cv_idxs(len(df), val_pct=0.1)
md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y.values, cat_flds=[], bs=128, is_reg=False) 
# I have no categorical variables that's why cat_flds=[]
m = md.get_learner(emb_szs=[], n_cont=len(df.columns), emb_drop=0.04, out_sz=1, szs= 
[1000,500], drops=[0.001,0.01])

Then Iā€™m getting the above exception (ClassNLLCriterion) followed by THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCTensorMath.cu line=15 error=59 : device-side assert triggered.

Do you have any idea what am I doing wrong?
Thanks a lot!

1 Like

Hi Rony,

Out_size = 1 would work with cross entropy but I think NLLL needs output of 2, ie it has one column for false and one for true. Its redundant information when you only need true or false but I use it because its easy to change the number of classes since you donā€™t need to change anything else. The prediction therefore needs one column per class. This can be confusing since no matter how many classes you have from your prediction, the target value is a single column with long integers between 0 and no_classes-1

You can see how I have used it in the example below (in the Class ClassifyFromAE)

The training, validation, loss and back propagation etc are managed by a class NN_Manage since this was all written prior to the fastai library.

Not sure this will solve your problem but it looks as though it could be part of the problem.

(note - I checked this with a simple example and I think it is correct)

1 Like

Thanks again.
I got it to work about an hour ago exactly by changing out_sz to 10.
So iā€™m happy we came to the same conclusion :slight_smile:
I wrote an explanation here if someone is interested.

2 Likes

I am facing the same issue. Were you able to figure it out?

I added a multi-label classification ability to the ColumnarModelData if anyone is interested :slight_smile:

5 Likes

This seems like it might be the most appropriate place for a more general question because I think it would mostly apply to problems that involve structured datasets. Does anyone have any thoughts on how to incorporate observation weights into PyTorch? What I mean by observation weights is that one observation might have been observed for a longer period of time than another observation and thus the first observation has more information and thus should inform the training more. For instance, if the goal of the model is to predict whether or not a car accident occurred and we have different observation lengths for each record, I want to inform the model of this fact.

My first guess about how to go about incorporating observation weights is to scale the loss contribution by the weight such that when the loss gets propagated back, the parameters are ā€˜weight-awareā€™ in their updates but I really donā€™t have any idea how to do that in PyTorch without breaking everything. Iā€™ve looked around the internet and I havenā€™t really seen this question asked or addressed. Any thoughts?

1 Like

@patrick This is perhaps naive, but could you simply make observation_period a feature of each observation, and let the training process decide what influence that should have?

@dangoldner That is not a bad idea at all, but if we know a priori that the probability of an accident scales linearly with observation length, I think it would be better to inform the model of this than require the model to learn it. Also, the solution you propose is not as general. Consider another use case where a dataset has been downsampled across some dimension. For example, every 5th observation that has a response value of ā€˜0ā€™ is kept and the other four are discarded. We might do this if the original dataset is large and thereā€™s a class imbalance. In order to get the overall average prediction right, we need to inform the model that every record with a response value of ā€˜0ā€™ is actually representative of five records. The only way I know how to do that is through observation weights.

1 Like

I am able to fix the issue by mapping the categorical variables from 0 to n.

Iā€™ve been working on a multi-class structured learner with embeddings for the categorical data and it seems to train well, with the val_loss steadily dropping, but I am having problems with the predictions. When I call learn.predict, I get an array with the dimensions of test_df x first hidden layer (2048), rather than test_df x out_sz (48). If I drop all the hidden layers, I get test_df x len(input with all the embeddings) (772).

md = ColumnarModelData.from_data_frames(ā€™/tmpā€™, trn_df, val_df, trn_y.astype(ā€˜intā€™), val_y.astype(ā€˜intā€™), cats, 512, is_reg = False, test_df = test_df)
model = MixedInputModel(emb_szs, len(contins), emb_drop=0, out_sz=48, szs=[2048,1024,512], drops=[0.1]).cuda()
bm = BasicModel(model, ā€˜muticlass_classifierā€™)
learn = StructuredLearner(md, bm)

Can anybody spot what I am doing wrong?

Running Amazon linux on a p2.