How to make a custom loss function (PyTorch)


(Jeremy Howard (Admin)) #22

What do you want to know about it exactly? Why are you thinking a normal function may not work?


(Prince Grover) #23

mm… So, here what I think (by reading more about it now) –

Of-course we want to compute gradients of loss functions and use them in back-propagation. Now as long as we use Variables all the times (without np conversion) , gradients will be calculated automatically (torch autograd). We will just need to call .backward for that. But if our graph recording of loss function is likely to be larger than our model, it is recommended to use custom torch autograd. This is when we would need nn.module().

But people on forums/ discussions have mostly used custom autograd function which led me to think that this is a necessary step. That’s why I asked you the previous question that does normal function even work. (for which I think, I got the answer now)

So, the conclusion is that a normal function is fine as long as it supports operations on Variables and takes Variables as input. But, if our loss function include a lot of tasks, it is recommended to built custom autograd. Right?

Below are the supporting links for the same.

https://spandan-madan.github.io/A-Collection-of-important-tasks-in-pytorch/
SECTION 5 - CUSTOM LOSS FUNCTIONS
Sometimes, we need to define our own loss functions. And here are a few things to know about this - custom Loss functions are defined using a custom class too. They inherit from torch.nn.Module just like the custom model

build costom loss - pytorch forums
Since the code does a lot of operations, the graph recording just the loss function would be likely much larger than that of your model. Because of this, I'd recommend you to write your own autograd function, or think a bit more about how can you compute your similarity matrix.

About autograd.

http://pytorch.org/docs/master/notes/extending.html
Adding operations to autograd requires implementing a new Function subclass for each operation. Recall that Function s are what autograd uses to compute the results and gradients, and encode the operation history. Every new function requires you to implement 2 methods: - Forward and Backward


(Jeremy Howard (Admin)) #24

Well thought through and exactly right.

Well… maybe. It seems unlikely that your final loss function would be a significant part of your gradient computation time in most cases, although it’s possible sometimes. I’d suggest seeing how long your model takes for an epoch with a simple function like rmse, and with your custom function, and only define your own backward if it turns out to be necessary. But even then, I’d first see if you can rewrite your loss function in a more ‘torch friendly’ way.


(Alex) #25

If competition requires to have some completely different loss function, should we have to redefine it in both places, i.e. m.crit=new_function and metrics=[new_function] or it is okay to have them slightly different, i.e. like in rossman notebook F.mse_loss used by default for training and metrics=[exp_rmspe]?

My original question was:


(Jeremy Howard (Admin)) #26

metrics are simply used for display. They’re not used for gradients. crit however is used for loss, and is also displayed.


(Will) #27

I’m trying to build a regression network that has 16 outputs with one of the 16 outputs weighted 3 times as high (or X times as high in the general case) for loss purposes as the other 15 outputs. I have built a network that works for the 16 outputs when they are all equal weighted, but how would I go about up-weighting one of the outputs above the others within the fastai library? I feel like there should be a simple way of doing this that I’m not thinking of.

I’ve tried this ugly hack and it didn’t work:

bs = 250
data = ColumnarModelData.from_data_frame(PATH, val_idxs= val_idxs, df= X, y= y_trn, cat_flds= cat_feats, bs= bs,
                                         is_multi=False, is_reg=True,test_df= X_tst,shuffle=True)

m = data.get_learner(emb_szs=emb_szs, n_cont=len(contin_feats)+len(mom_feats)+len(fact_feats),emb_drop=0.3,out_sz=16,
                     szs= [1024,1024], drops= [0.0,0.5], y_range= y_range)

weight = torch.ones(16)
weight[0] = 3
m.crit = nn.MSELoss(reduce=False)*weight

which throws an error that basically says hey, mseloss isn’t something you can multiply by integers or floats which makes sense after reading this thread things need to be wrapped in a torch variable and i need to use torch.mul() instead of just *

so then I went deeper into the fast.ai library to see where the loss function is being called and it looks like it’s here for structured data in column_data.py:

def _get_crit(self, data): return F.mse_loss if data.is_reg else F.binary_cross_entropy if data.is_multi else F.nll_loss

So the code is calling F.mse_loss for regression. I tried changing my code to:

m.crit = torch.mul(F.mse_loss(),weight)

but I couldn’t get it to work here either.

I then went into the pytorch source code for F.mse_loss and tried adding in a multiplcation by weight but that didn’t work either. I feel like I’m chasing my tail here, can someone help point me in the right direction?


(RobG) #28

Some pytorch loss functions allow class weights to be passed in. Have you ruled that approach out? Eg nn.CrossEntropyLoss(weight=class_weights)


(Will) #29

I don’t see any weights parameter for mse_loss and is doesn’t look like there is anything in the source to handle that. I’m open to other regression loss functions that could accept weights though. I don’t feel like this is that unusual of a thing to want to do.


(David Salazar) #32

Hi!

Have you looked at this?

Hope it helps!


(Will) #33

I hadn’t seen that yet actually so thank you. It is helpful but doesn’t looks like it solves my issue. I can define a working function, the problem is when i try to assign it to my model.crit= MY_NEW_FUNCTION is through an error looking for inputs into the function that aren’t calculated yet.

def weighted_mse_loss(input, target):
    weight = torch.ones(16)
    weight[0] = 3
    return torch.mean(weight * (input - target) ** 2)


m.crit=weighted_mse_loss()

gives me this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-c956b86e8f4f> in <module>()
      4     return torch.mean(weight * (input - target) ** 2)
      5 
----> 6 m.crit=weighted_mse_loss()

TypeError: weighted_mse_loss() missing 2 required positional arguments: 'input' and 'target'

(Will) #34

derp, figured out why i was getting the error. I was setting it like this:

m.crit = weighted_mse_loss()

instead of like this:

m.crit = weighted_mse_loss

That being said now i’m getting a new pytorch error saying the multiplication function in pytorch got a Variable but expected a floatTensor. I thought everything had to be variables in loss functions for autograd to work ?

so i know the new weight object is a FloatTensor, do i need to change the input and target to FloatTensor instead of Variables and then wrap the whole thing in a Variable statement ?

TypeError: mul received an invalid combination of arguments - got (Variable), but expected one of:
 * (float value)
      didn't match because some of the arguments have invalid types: (Variable)
 * (torch.FloatTensor other)
      didn't match because some of the arguments have invalid types: (Variable)

(Will) #35

I got it!

this works as a custom weighted multi-output MSE function where the squared error of the first element of the predictions is weighted 8/16 and the rest 0.5/15 to sum to (approx due to flt div) 1

def weighted_mse_loss(input,target):
    #alpha of 0.5 means half weight goes to first, remaining half split by remaining 15
    weights = Variable(torch.Tensor([0.5,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15])).cuda()  
    pct_var = (input-target)**2
    out = pct_var * weights.expand_as(target)
    loss = out.mean() 
    return loss

#36

no such thing as silly or mistake or not, cpux, do any nmw is ok