A quick-and-easy way to make ULMFiT work for multi-label problems


(WG) #1

I see a few folks on the forums asking how to use ULMFiT for multilabel problems and so I thought I’d throw up a quick and dirty method that has worked for me.

  1. Create a multi-label friendly classifier as such:
class MultiLabelClassifier(nn.Module):
    
    def __init__(self, y_range=None):
        super().__init__()
        self.y_range = y_range
    
    def forward(self, input):
        x, raw_outputs, outputs = input
        x = F.sigmoid(x)
        if (self.y_range):
            x = x * (self.y_range[1] - self.y_range[0])
            x = x + self.y_range[0]
        
        return x, raw_outputs, outputs
  1. Create you learner for classification like normal (note that the n_class argument isn’t used insofar as I can tell and also that I’ve change the number of activations in my last layer from the # of classes to the # of labels I’m trying to simultaneously predict; in this case “7”):
m = get_rnn_classifer(bptt, max_seq=20*70, n_class=7, n_tok=vs, emb_sz=em_sz, n_hid=nh, n_layers=nl,
                     pad_token=1, layers=[em_sz*3, 50, 7], drops=[drops[4], 0.1],
                     dropouti=drops[0], wdrop=drops[1], dropoute=drops[2], dropouth=drops[3])
  1. Change your loss function to binary cross-entropy and append an instance of your MultiLabelClassifier on top of your Sequential model:
learner.crit = F.binary_cross_entropy
learner.model.add_module('2', MultiLabelClassifier())
  1. Train as normal

(Kachi O.) #2

Clever approach! I would not have thought to use learner.model.add_module(), this seems very handy.

The approach that worked for me was to set learn.crit = F.binary_cross_entropy_with_logits, keeping everything the same.


#3

Are you one-hot encoding the targets? I’m getting a RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'target' error and suspect that might be the culprit.


(WG) #4

They are essentially OHE, yes.

You’re problem is the datatype. If you want to use binary cross entropy your targets will need to be converted to floats. See the pytorch docs here.

FYI: This took me awhile to figure out too :). The planet notebook (I believe from lesson 2) is instructive as well.


#5

Thanks! I spaced reading this and just realized I’m actually doing multi-class instead of multi-label for text sentiment though it appears to be working (though the case could be made it is in fact multi-label and I could just take the max of the 6 preds, softmax might be a good choice here possibly).


(Francisco Rodes) #6

Hi,

I might got lost in the way but, why do you do that? I think the orginal notebook is working for me for classifing the AG (4 classes) dataset. I have only changed the input data.

Thanks!


(WG) #7

You do that, one hot encode, when you are simultaneously trying to predict multiple labels at the same time as opposed to what you see in the course notebooks here where you are trying to predict the class for a single label. The first is known as multi-label classification whereas the later is known as multi-class classification, and while similar, each requires different data formats suitable to the task at hand.

If you want to see how multi-label looks/works, check out the lesson 2 notebook and planet competition notebooks from the course.


(Francisco Rodes) #8

Sure! My bad. Thanks for the fast answer.

By the way, when you are doing multi-clas (my problem). Do you look at the different colums as different inputs? Or as the default model does you put everything together?

Tahnk you again!
BR


(WG) #9

multi-class should function just like the notebook … you’ll have 2 or more classes in a single column.


(Francisco Rodes) #10

Thanks again!

And last question please. Is the validation set the same as the test? Is he only using these two?

While I was reading the paper there are some tables refering to test error and others to validation error. Could you clarify that? Thanks in advance!


#11

Hi I am trying your way to train my model. But I still get the error

multi-target not supported at /pytorch/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:16

Do you have complete codes for multi-label problems? Thanks!


(WG) #12

I cannot share my work as the dataset is proprietary. However, I can share with you how I would solve your issue (which I think is even more helpful).

First, without having access to anything in your notebook outside of the error message you’ve pasted above, I can tell it has something to do with your optimization/loss function and/or that function’s ability to understand the target/actual values in your dataset.

Second, given the above, I’d first check to make sure I’m using binary_cross_entropy as my loss function. If not, do so and see if that resolves things. STOP if all is working.

Third, if I have the right loss function I’d look at the dimensions of my targets and the PyTorch documentation to see what binary_cross_entropy wants. When i do this, I notice it wants a zero dimensional array … something that looks like this for 8 labels: (8,). Once you got the right data format and the right loss function, all should train fine.

tl;dr: Based on your error message you are not using the right loss function and/or don’t have your labels in the right format for a multi-label model.


#13

Thank you for the solutions. I think I already change loss function to binary_cross_entropy. It can not work. I am trying to check if the labels are not the right format.

I also upload my code here. https://github.com/happypetewht/toxic/blob/master/Untitled0.ipynb

I appreciate your help if you can help me check that at your convenience.


#14

Now my target value’s dimension is (127656, 6). From your solution, should I convert it to (6,) for the loss function? Then iterate it for 127656 times?


(WG) #15

So you can see from the stack trace that it has something to do with the loss function not being able to work with the format of the data passed into it. You have binary cross entropy setup as the loss function so you know that at least that is right.

So …

It think it has to do with the shape/size of your labels.

trn_labels = np.squeeze(np.load('input/trn_labels.npy'))
val_labels = np.squeeze(np.load('input/val_labels.npy'))

For multi-label, it should be 1 dimensional whereas for multiclass it should be a rank zero tensor. Try removing the np.squeeze from both lines above and run it again. Lmk if that solves it.


#16

A small addition (and please correct me if I’m wrong), the metrics should also support multilabel:

learn.metrics = [accuracy_thresh(0.5)]


(WG) #17

You’ll have to use one of the multi-accuracy methods in fast.ai


#18

This function supports multi-accuracy

def accuracy_thresh(thresh):
    return lambda preds,targs: accuracy_multi(preds, targs, thresh)

#19

Why do you keep the PoolingLinearClassifier before the MultiLabelClassifier? Can we build a multi-label classifier with only the encoder?
It works of course as is, I’m just trying to understand :slight_smile:

Thanks!


(WG) #20

Yes you can.

I simply tack on the custom nn.module here so as to maximize my use of what the fast.ai framework already provides.