Changing default loss functions

sheaml · November 1, 2018, 8:22pm

Hi,

I’m using fastai v1 on Google colab. I have a multi-class image classification problem. I would like to use class weights in my loss function.

I’ve successfully gotten the model to train using class weights with the following:

w = torch.cuda.FloatTensor([1.0, 0.9, 1.1])
learn = create_cnn(data, models.resnet18, metrics = error_rate, loss_func=torch.nn.CrossEntropyLoss(weight=w))
learn.fit_one_cycle(1, 0.001)

But ClassificationInterpretation of the learner object will fail:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'weight'

However, if I use torch.FloatTensor instead of torch.cuda.FloatTensor for the weights, I will get the inverse error when trying to train the model (“expecting backend CUDA but got backend CPU”).

It’s also not entirely clear to me if I should set the loss function when I create the data object, the learner object, or both.

Thanks for any advice.

sgugger · November 1, 2018, 9:58pm

That’s an interesting bug. It comes from the fact we stack the predictions after passing them on the CPU in get_preds to avoid out of memory errors on the GPU. Not sure how to solve this but will look at it. In the meantime, you can change the loss function of the learner after training it (to force the weights on the CPU) by calling:
learn.loss_func = ...

sheaml · November 1, 2018, 10:03pm

Great - thanks.

nok · November 28, 2018, 7:09am

Is it still the proper way to change loss function? It doesn’t seems to change the loss function as I get similar loss from learn.lr_find()

learn.fit does report a different number, but lr_find seems using the data.loss_func instead.

sgugger · November 28, 2018, 2:38pm

data.loss_func is the default that the learner will use, but if you overwrite it like you did, it will be ignored afterward.

nok · November 28, 2018, 3:24pm

Thanks for confirming it, is there a reason why data does not simply takes loss_func from learner but need to have a separate loss_func attribute. What is the usage of loss_func for DataBunch without a Learner?

sgugger · November 28, 2018, 3:29pm

You are taking this the wrong way: data offers a default loss function to the learner that is suitable for this kind of target (cross entropy for classification, mse for regression…)

pattyhendrix · January 30, 2019, 10:50pm

im getting the same error but for #2 ‘weight’:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘weight’

im trying to grab the activations from a vgg16_bn:
vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)
vgg_m = nn.Sequential(*children(vgg_m)[:37])

i have an image and its tensor is of size:
torch.Size([3, 288, 288])

i get the error when i try to pass the tensor of the image through the model and havent been able to get passed it:
vgg_m(image_tensor[None])

I havent defined a loss function a loss function yet.

the final runtime error is coming from here:
torch\nn\modules\conv.py in forward(self, input)
318 def forward(self, input):
319 return F.conv2d(input, self.weight, self.bias, self.stride,
–> 320 self.padding, self.dilation, self.groups)

Thanks for any help!