What are the different ways a logarithm can result in a NaN?


I’ve implemented a metric (Root Mean Square Logarithmic Error). It’s essentially RMSE, but all inputs pass through a logarithm.

Below is my implementation:

def rmsle(preds, targs):
    return F.mse_loss(preds.log1p(), targs.log1p()).sqrt()

During training, I get the following values for the metric.

I’m not quite sure why I’m getting NaN. I know this can occur when 0 or a negative value is input to a logarithm. However, my data has no negative values and I’m accounting for 0 values by using the torch.log1p() function, which adds 1 to whatever value is input.

Is it simply the model itself outputting negative predictions?

I’d appreciate any input!

1 Like

Hi @ForBo7 Have you tried adding one to your predictions before implementing the function? I often do that when working with logarithms.



Most likely the issue is that preds holds values smaller than or equal to -1. Even with log1p, that would still be a problem because when preds contains, for instance, -3, then -3 + 1 = -2, whose logarithm is undefined on the real numbers. One solution is to either clamp the predictions with preds = preds.clamp(min=0) to ensure a minimum value of 0 or normalize them with fastai’s sigmoid_range, e.g., preds = sigmoid_range(preds, 0, 10).

Alternatively, preds might contain NaNs, which can be checked by preds.isnan().sum().


Checked for NaNs with preds.isnan().sum(), and there were none. So negative prediction are the culprit.

Before writing this post, I did consider clamping the predictions to be positive only because, as you said too, log1p wouldn’t help for negative values, but wanted to see if there was a better way.

The reason being because if I clamped, the clamped predictions wouldn’t be fed back into the model. But I suppose the model will figure that out itself from its calculated loss? :thinking:

Can’t use sigmoid_range however because the outputs are continuous.

1 Like

Yup! Did that with the log1p function which adds 1 for you.

Figured out from the help of BobMcDear’s comment that negative predictions are indeed the culprit.

Thanks for your input!

Thanks to the power of backpropagation and PyTorch’s autograd engine, you need not worry about any problems arising from clamping. However, I’d recommend including the clamp operation in the model’s forward pass because it would, in a sense, become part of the network’s definition and not merely a step in the calculation of the loss function. That is, the model’s outputs would always have to be clamped to get reliable results, be it during inference or training.

Also, you can think of clamping values at 0 as the equivalent of ReLU - in fact, you could insert ReLU as the final layer of your network rather than clamping.

sigmoid_range scales a tensor’s content to a given range and is meant for continuous data.

1 Like


ReLUs were indeed coming to mind as I was dabbling around this!

I’m using fastai to create my learner and have so far not specified any model to this learner. If I wanted to specify the use of the ReLU as an activation, would I have to create my own PyTorch model which I then input to the learner, or is there a function that would allow me to easily specify it?

I did find this tabular_config function in which I passed torch.nn.ReLU to the act_cls parameter, and then input this function to the learner. Didn’t seem to work though as I still got NaNs.

Oh, I see. Judging by the docs, I still don’t think it would work for my case? Because I would not know what the maximum value would be. Unless I just take the maximum value from my dataset?

Great, to learn that you got it. I seem to have missed that. :sweat_smile:

To the best of my knowledge, fastai does not natively offer an option for specifying a final activation layer, but assuming the Learner is an instance of TabularLearner and fastai automatically constructed the model, appending ReLU to it is as straightforward as learn.model.layers.append(nn.ReLU()).

Yes, you could determine the maximum value from your dataset or prior domain knowledge. If you do decide to proceed with sigmoid_range, you can pass the desired range to the y_range argument of TabularLearner. Ultimately, you should experiment with both methods to evaluate the efficacy of each.

1 Like


Ooo, that’ll be nifty to know.

I tried out sigmoid_range and got better performance using it! Good I tried it out afterall :smile:

Thank you for your help!

1 Like