How to make a custom loss function (PyTorch)

Thanks Jeremy. After banging my head for a few hours, I figured out that the error I faced was due to dataset class. It turned out that I was returning object dtype as output of dataset class, which is not allowed. (floats or ints are allowed)

It threw TypeError: zip argument #58 must support iteration error.

So I updated my code to return 2 stacked columns (not 1 column with list as each element)

But now I am facing another error which I am trying to resolve (by reading pytorch forums). It seems like it is related to embedding layer but not sure what should I change in code.


you need to .cuda your model.

Oh. Yes. Thanks! (Such a silly mistake. :man_facepalming: )

We’ve all done it. The trick is to carefully look at the ‘got’ vs ‘expected’ bits of the exception, and note that one type has ‘cuda’ and the other doesn’t.

Yeah. This tip helped. After that, I came across same error multiple times and was able to solve those quickly.

Anyways, got initial full working code for the same and now I can start “actual training”. :wink:

github repo link

1 Like

hi @jeremy,
If we come back to lesson3-rossman.ipynb I just wanted to confirm my understanding:

We create a custom loss function:

def inv_y(a): return np.exp(a)

def exp_rmspe(y_pred, targ):
targ = inv_y(targ)
pct_var = (targ - inv_y(y_pred))/targ
return math.sqrt((pct_var**2).mean())

We use it later in fit after we do m = md.get_learner(…):, 3, metrics=[exp_rmspe])

However, under the hood in it defaults to F.mse_loss:

class StructuredLearner(Learner):
def init(self, data, models, **kwargs):
super().init(data, models, **kwargs)
self.crit = F.mse_loss

Since in the notebook we don’t override learner “crit” attribute, so my understanding we are using two loss functions F.mse_loss for training and custom exp_rmspe for accuracy metric and printing outputs.

Is that because exp_rmspe and mse_loss conceptually the same way of calculating the loss?
In general should we use same loss function algorithm for both metrics=[] and learner.crit=? Or I am missing something.

1 Like

And to add on to Alex’s question, when I was searching for how to make custom training loss , almost all the posts I stumbled upon were saying that we need to make the loss function using nn.module() class with forward function. (In some cases backward fn also). But you said it is not required and it can be a normal function which just needs to be compatible with tensor related operations. How does that work then?

What do you want to know about it exactly? Why are you thinking a normal function may not work?

mm… So, here what I think (by reading more about it now) –

Of-course we want to compute gradients of loss functions and use them in back-propagation. Now as long as we use Variables all the times (without np conversion) , gradients will be calculated automatically (torch autograd). We will just need to call .backward for that. But if our graph recording of loss function is likely to be larger than our model, it is recommended to use custom torch autograd. This is when we would need nn.module().

But people on forums/ discussions have mostly used custom autograd function which led me to think that this is a necessary step. That’s why I asked you the previous question that does normal function even work. (for which I think, I got the answer now)

So, the conclusion is that a normal function is fine as long as it supports operations on Variables and takes Variables as input. But, if our loss function include a lot of tasks, it is recommended to built custom autograd. Right?

Below are the supporting links for the same.
Sometimes, we need to define our own loss functions. And here are a few things to know about this - custom Loss functions are defined using a custom class too. They inherit from torch.nn.Module just like the custom model

build costom loss - pytorch forums
Since the code does a lot of operations, the graph recording just the loss function would be likely much larger than that of your model. Because of this, I'd recommend you to write your own autograd function, or think a bit more about how can you compute your similarity matrix.

About autograd.
Adding operations to autograd requires implementing a new Function subclass for each operation. Recall that Function s are what autograd uses to compute the results and gradients, and encode the operation history. Every new function requires you to implement 2 methods: - Forward and Backward


Well thought through and exactly right.

Well… maybe. It seems unlikely that your final loss function would be a significant part of your gradient computation time in most cases, although it’s possible sometimes. I’d suggest seeing how long your model takes for an epoch with a simple function like rmse, and with your custom function, and only define your own backward if it turns out to be necessary. But even then, I’d first see if you can rewrite your loss function in a more ‘torch friendly’ way.


If competition requires to have some completely different loss function, should we have to redefine it in both places, i.e. m.crit=new_function and metrics=[new_function] or it is okay to have them slightly different, i.e. like in rossman notebook F.mse_loss used by default for training and metrics=[exp_rmspe]?

My original question was:

metrics are simply used for display. They’re not used for gradients. crit however is used for loss, and is also displayed.


I’m trying to build a regression network that has 16 outputs with one of the 16 outputs weighted 3 times as high (or X times as high in the general case) for loss purposes as the other 15 outputs. I have built a network that works for the 16 outputs when they are all equal weighted, but how would I go about up-weighting one of the outputs above the others within the fastai library? I feel like there should be a simple way of doing this that I’m not thinking of.

I’ve tried this ugly hack and it didn’t work:

bs = 250
data = ColumnarModelData.from_data_frame(PATH, val_idxs= val_idxs, df= X, y= y_trn, cat_flds= cat_feats, bs= bs,
                                         is_multi=False, is_reg=True,test_df= X_tst,shuffle=True)

m = data.get_learner(emb_szs=emb_szs, n_cont=len(contin_feats)+len(mom_feats)+len(fact_feats),emb_drop=0.3,out_sz=16,
                     szs= [1024,1024], drops= [0.0,0.5], y_range= y_range)

weight = torch.ones(16)
weight[0] = 3
m.crit = nn.MSELoss(reduce=False)*weight

which throws an error that basically says hey, mseloss isn’t something you can multiply by integers or floats which makes sense after reading this thread things need to be wrapped in a torch variable and i need to use torch.mul() instead of just *

so then I went deeper into the library to see where the loss function is being called and it looks like it’s here for structured data in

def _get_crit(self, data): return F.mse_loss if data.is_reg else F.binary_cross_entropy if data.is_multi else F.nll_loss

So the code is calling F.mse_loss for regression. I tried changing my code to:

m.crit = torch.mul(F.mse_loss(),weight)

but I couldn’t get it to work here either.

I then went into the pytorch source code for F.mse_loss and tried adding in a multiplcation by weight but that didn’t work either. I feel like I’m chasing my tail here, can someone help point me in the right direction?

Some pytorch loss functions allow class weights to be passed in. Have you ruled that approach out? Eg nn.CrossEntropyLoss(weight=class_weights)

I don’t see any weights parameter for mse_loss and is doesn’t look like there is anything in the source to handle that. I’m open to other regression loss functions that could accept weights though. I don’t feel like this is that unusual of a thing to want to do.


Have you looked at this?

Hope it helps!

I hadn’t seen that yet actually so thank you. It is helpful but doesn’t looks like it solves my issue. I can define a working function, the problem is when i try to assign it to my model.crit= MY_NEW_FUNCTION is through an error looking for inputs into the function that aren’t calculated yet.

def weighted_mse_loss(input, target):
    weight = torch.ones(16)
    weight[0] = 3
    return torch.mean(weight * (input - target) ** 2)


gives me this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-79-c956b86e8f4f> in <module>()
      4     return torch.mean(weight * (input - target) ** 2)
----> 6 m.crit=weighted_mse_loss()

TypeError: weighted_mse_loss() missing 2 required positional arguments: 'input' and 'target'

derp, figured out why i was getting the error. I was setting it like this:

m.crit = weighted_mse_loss()

instead of like this:

m.crit = weighted_mse_loss

That being said now i’m getting a new pytorch error saying the multiplication function in pytorch got a Variable but expected a floatTensor. I thought everything had to be variables in loss functions for autograd to work ?

so i know the new weight object is a FloatTensor, do i need to change the input and target to FloatTensor instead of Variables and then wrap the whole thing in a Variable statement ?

TypeError: mul received an invalid combination of arguments - got (Variable), but expected one of:
 * (float value)
      didn't match because some of the arguments have invalid types: (Variable)
 * (torch.FloatTensor other)
      didn't match because some of the arguments have invalid types: (Variable)

I got it!

this works as a custom weighted multi-output MSE function where the squared error of the first element of the predictions is weighted 8/16 and the rest 0.5/15 to sum to (approx due to flt div) 1

def weighted_mse_loss(input,target):
    #alpha of 0.5 means half weight goes to first, remaining half split by remaining 15
    weights = Variable(torch.Tensor([0.5,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15,0.5/15])).cuda()  
    pct_var = (input-target)**2
    out = pct_var * weights.expand_as(target)
    loss = out.mean() 
    return loss

no such thing as silly or mistake or not, cpux, do any nmw is ok