I see a few folks on the forums asking how to use ULMFiT for multilabel problems and so I thought I’d throw up a quick and dirty method that has worked for me.
Create a multi-label friendly classifier as such:
class MultiLabelClassifier(nn.Module):
def __init__(self, y_range=None):
super().__init__()
self.y_range = y_range
def forward(self, input):
x, raw_outputs, outputs = input
x = F.sigmoid(x)
if (self.y_range):
x = x * (self.y_range[1] - self.y_range[0])
x = x + self.y_range[0]
return x, raw_outputs, outputs
Create you learner for classification like normal (note that the n_class argument isn’t used insofar as I can tell and also that I’ve change the number of activations in my last layer from the # of classes to the # of labels I’m trying to simultaneously predict; in this case “7”):
Are you one-hot encoding the targets? I’m getting a RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.cuda.LongTensor] for argument #1 'target' error and suspect that might be the culprit.
Thanks! I spaced reading this and just realized I’m actually doing multi-class instead of multi-label for text sentiment though it appears to be working (though the case could be made it is in fact multi-label and I could just take the max of the 6 preds, softmax might be a good choice here possibly).
I might got lost in the way but, why do you do that? I think the orginal notebook is working for me for classifing the AG (4 classes) dataset. I have only changed the input data.
You do that, one hot encode, when you are simultaneously trying to predict multiple labels at the same time as opposed to what you see in the course notebooks here where you are trying to predict the class for a single label. The first is known as multi-label classification whereas the later is known as multi-class classification, and while similar, each requires different data formats suitable to the task at hand.
If you want to see how multi-label looks/works, check out the lesson 2 notebook and planet competition notebooks from the course.
By the way, when you are doing multi-clas (my problem). Do you look at the different colums as different inputs? Or as the default model does you put everything together?
I cannot share my work as the dataset is proprietary. However, I can share with you how I would solve your issue (which I think is even more helpful).
First, without having access to anything in your notebook outside of the error message you’ve pasted above, I can tell it has something to do with your optimization/loss function and/or that function’s ability to understand the target/actual values in your dataset.
Second, given the above, I’d first check to make sure I’m using binary_cross_entropy as my loss function. If not, do so and see if that resolves things. STOP if all is working.
Third, if I have the right loss function I’d look at the dimensions of my targets and the PyTorch documentation to see what binary_cross_entropy wants. When i do this, I notice it wants a zero dimensional array … something that looks like this for 8 labels: (8,). Once you got the right data format and the right loss function, all should train fine.
tl;dr: Based on your error message you are not using the right loss function and/or don’t have your labels in the right format for a multi-label model.
Thank you for the solutions. I think I already change loss function to binary_cross_entropy. It can not work. I am trying to check if the labels are not the right format.
Now my target value’s dimension is (127656, 6). From your solution, should I convert it to (6,) for the loss function? Then iterate it for 127656 times?
So you can see from the stack trace that it has something to do with the loss function not being able to work with the format of the data passed into it. You have binary cross entropy setup as the loss function so you know that at least that is right.
So …
It think it has to do with the shape/size of your labels.
For multi-label, it should be 1 dimensional whereas for multiclass it should be a rank zero tensor. Try removing the np.squeeze from both lines above and run it again. Lmk if that solves it.
Why do you keep the PoolingLinearClassifier before the MultiLabelClassifier? Can we build a multi-label classifier with only the encoder?
It works of course as is, I’m just trying to understand