Custom loss function

bushaev · December 5, 2017, 2:42pm

I’m trying to implement squared hinge loss function as defined in here.
loss = sum(max(0, 0.5 - y_true*y_pred) ** 2)

So, I’ve tried to do just that and wrote:

def hinge(y_true, y_pred):
    return torch.sum(torch.max(0, 0.5 - y_true * y_pred) ** 2)

That gave me an error:

in hinge(y_true, y_pred)
1 def hinge(y_true, y_pred):
----> 2 torch.sum(torch.max(0, 0.5 - y_true * y_pred) ** 2)
3

~/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py in max(self, dim)
445 def max(self, dim=None):
446 if isinstance(dim, Variable):
–> 447 return Cmax()(self, dim)
448 return Max(dim)(self)
449

RuntimeError: expected a Variable argument, but got int

Which is understandable, so got it fixed with the following code:

def hinge(y_true, y_pred):
    return torch.sum(torch.max(
           Variable(torch.zeros(y_true.size(0))), 0.5 - y_true * y_pred) ** 2)

I’m wondering, is there a better way to deal with it without creating a tensor of zeros every time and putting it in a variable ? I’m guessing it’s expensive to construct a Variable each time when we calculating loss and putting it on gpu from cpu.

anandsaha · December 5, 2017, 3:15pm

How about this:

def hinge(y_true, y_pred):
    zero = torch.Tensor([0]) 
    return torch.sum(torch.max(zero, 0.5 - y_true * y_pred) ** 2)

Test:

t1 = torch.Tensor([0.1, 0.2, 0.1, 0.1])
t2 = torch.Tensor([1.1, 2.2, 3.3, 4.4])
hinge(t1, t2)

0.1881999708712101

anandsaha · December 5, 2017, 3:20pm

PyTorch/Python anyway will broadcast the 0. It will be interesting to see how much memory the internal broadcasting takes vs. the memory taken by explicitly providing the zeros. Former should be very cheap I believe.

radek · December 5, 2017, 4:05pm

I suspect that either way this probably will be very cheap (though the broadcasting trick from @anandsaha is definitely a neat idea )

The good news is that with yesterday’s release of pytorch 0.3 we now get a profiler so maybe there would be value in pointing it at this But I suspect that for most of the use cases giving this much thought is probably not essential - my guess is that this would only matter if your model was impractically small or something like that.

If you do go the profiler route, please share your findings with us - would be interested to see the difference that the broadcasting trick makes and also what percentage of time this takes up relative to everything else your model is doing.

bushaev · December 5, 2017, 4:35pm

I’m worried about the idea of creating a tensor on cpu and then every time sending exactly the same information on gpu. I wonder how slow this is.

ramesh · December 5, 2017, 4:37pm

http://pytorch.org/ now has binaries for Cuda 7.5, 8 and 9 with PyTorch 0.3

anandsaha · December 5, 2017, 4:44pm

You can directly create it on GPU I believe:

zero = torch.Tensor([0]) .cuda()

Here I am not sure if the tensor is created on CPU first and then sent to GPU, or directly created on GPU. Anyone?

–

radek · December 5, 2017, 5:19pm

An answer from Soumith himself on the pytorch forums:

torch.cuda.FloatTensor(1000, 1000).fill_(0)

bushaev · December 5, 2017, 5:35pm

Thanks!