A quick-and-easy way to make ULMFiT work for multi-label problems

wgpubs · May 30, 2018, 2:03am

I see a few folks on the forums asking how to use ULMFiT for multilabel problems and so I thought I’d throw up a quick and dirty method that has worked for me.

Create a multi-label friendly classifier as such:

class MultiLabelClassifier(nn.Module):
    
    def __init__(self, y_range=None):
        super().__init__()
        self.y_range = y_range
    
    def forward(self, input):
        x, raw_outputs, outputs = input
        x = F.sigmoid(x)
        if (self.y_range):
            x = x * (self.y_range[1] - self.y_range[0])
            x = x + self.y_range[0]
        
        return x, raw_outputs, outputs

Create you learner for classification like normal (note that the n_class argument isn’t used insofar as I can tell and also that I’ve change the number of activations in my last layer from the # of classes to the # of labels I’m trying to simultaneously predict; in this case “7”):

m = get_rnn_classifer(bptt, max_seq=20*70, n_class=7, n_tok=vs, emb_sz=em_sz, n_hid=nh, n_layers=nl,
                     pad_token=1, layers=[em_sz*3, 50, 7], drops=[drops[4], 0.1],
                     dropouti=drops[0], wdrop=drops[1], dropoute=drops[2], dropouth=drops[3])

Change your loss function to binary cross-entropy and append an instance of your MultiLabelClassifier on top of your Sequential model:

learner.crit = F.binary_cross_entropy
learner.model.add_module('2', MultiLabelClassifier())

Train as normal

kachio · May 30, 2018, 2:19am

Clever approach! I would not have thought to use learner.model.add_module(), this seems very handy.

The approach that worked for me was to set learn.crit = F.binary_cross_entropy_with_logits, keeping everything the same.

msmedes · May 30, 2018, 4:15pm

Are you one-hot encoding the targets? I’m getting a RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'target' error and suspect that might be the culprit.

wgpubs · May 30, 2018, 5:27pm

They are essentially OHE, yes.

You’re problem is the datatype. If you want to use binary cross entropy your targets will need to be converted to floats. See the pytorch docs here.

FYI: This took me awhile to figure out too :). The planet notebook (I believe from lesson 2) is instructive as well.

msmedes · May 30, 2018, 5:45pm

Thanks! I spaced reading this and just realized I’m actually doing multi-class instead of multi-label for text sentiment though it appears to be working (though the case could be made it is in fact multi-label and I could just take the max of the 6 preds, softmax might be a good choice here possibly).

frodesc · June 25, 2018, 4:01pm

Hi,

I might got lost in the way but, why do you do that? I think the orginal notebook is working for me for classifing the AG (4 classes) dataset. I have only changed the input data.

Thanks!

wgpubs · June 25, 2018, 10:01pm

You do that, one hot encode, when you are simultaneously trying to predict multiple labels at the same time as opposed to what you see in the course notebooks here where you are trying to predict the class for a single label. The first is known as multi-label classification whereas the later is known as multi-class classification, and while similar, each requires different data formats suitable to the task at hand.

If you want to see how multi-label looks/works, check out the lesson 2 notebook and planet competition notebooks from the course.

frodesc · June 26, 2018, 12:17pm

Sure! My bad. Thanks for the fast answer.

By the way, when you are doing multi-clas (my problem). Do you look at the different colums as different inputs? Or as the default model does you put everything together?

Tahnk you again!
BR

wgpubs · June 26, 2018, 5:28pm

multi-class should function just like the notebook … you’ll have 2 or more classes in a single column.

frodesc · June 27, 2018, 3:11pm

Thanks again!

And last question please. Is the validation set the same as the test? Is he only using these two?

While I was reading the paper there are some tables refering to test error and others to validation error. Could you clarify that? Thanks in advance!

Haotian · August 31, 2018, 8:40pm

Hi I am trying your way to train my model. But I still get the error

multi-target not supported at /pytorch/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16

Do you have complete codes for multi-label problems? Thanks!

wgpubs · August 31, 2018, 10:39pm

I cannot share my work as the dataset is proprietary. However, I can share with you how I would solve your issue (which I think is even more helpful).

First, without having access to anything in your notebook outside of the error message you’ve pasted above, I can tell it has something to do with your optimization/loss function and/or that function’s ability to understand the target/actual values in your dataset.

Second, given the above, I’d first check to make sure I’m using binary_cross_entropy as my loss function. If not, do so and see if that resolves things. STOP if all is working.

Third, if I have the right loss function I’d look at the dimensions of my targets and the PyTorch documentation to see what binary_cross_entropy wants. When i do this, I notice it wants a zero dimensional array … something that looks like this for 8 labels: (8,). Once you got the right data format and the right loss function, all should train fine.

tl;dr: Based on your error message you are not using the right loss function and/or don’t have your labels in the right format for a multi-label model.

Haotian · September 4, 2018, 7:25pm

Thank you for the solutions. I think I already change loss function to binary_cross_entropy. It can not work. I am trying to check if the labels are not the right format.

I also upload my code here. https://github.com/happypetewht/toxic/blob/master/Untitled0.ipynb

I appreciate your help if you can help me check that at your convenience.

Haotian · September 7, 2018, 8:47pm

Now my target value’s dimension is (127656, 6). From your solution, should I convert it to (6,) for the loss function? Then iterate it for 127656 times?

wgpubs · September 9, 2018, 6:07pm

So you can see from the stack trace that it has something to do with the loss function not being able to work with the format of the data passed into it. You have binary cross entropy setup as the loss function so you know that at least that is right.

So …

It think it has to do with the shape/size of your labels.

trn_labels = np.squeeze(np.load('input/trn_labels.npy'))
val_labels = np.squeeze(np.load('input/val_labels.npy'))

For multi-label, it should be 1 dimensional whereas for multiclass it should be a rank zero tensor. Try removing the np.squeeze from both lines above and run it again. Lmk if that solves it.

ranih · September 12, 2018, 10:07pm

A small addition (and please correct me if I’m wrong), the metrics should also support multilabel:

learn.metrics = [accuracy_thresh(0.5)]

wgpubs · September 13, 2018, 2:10am

You’ll have to use one of the multi-accuracy methods in fast.ai

ranih · September 13, 2018, 6:13am

This function supports multi-accuracy

def accuracy_thresh(thresh):
    return lambda preds,targs: accuracy_multi(preds, targs, thresh)

ranih · September 17, 2018, 7:45pm

Why do you keep the PoolingLinearClassifier before the MultiLabelClassifier? Can we build a multi-label classifier with only the encoder?
It works of course as is, I’m just trying to understand

Thanks!

wgpubs · September 17, 2018, 9:08pm

Yes you can.

I simply tack on the custom nn.module here so as to maximize my use of what the fast.ai framework already provides.