# What are the different ways a logarithm can result in a NaN?

Hello.

I’ve implemented a metric (Root Mean Square Logarithmic Error). It’s essentially RMSE, but all inputs pass through a logarithm.

Below is my implementation:

``````def rmsle(preds, targs):
return F.mse_loss(preds.log1p(), targs.log1p()).sqrt()
``````

During training, I get the following values for the metric.

I’m not quite sure why I’m getting `NaN`. I know this can occur when 0 or a negative value is input to a logarithm. However, my data has no negative values and I’m accounting for 0 values by using the `torch.log1p()` function, which adds 1 to whatever value is input.

Is it simply the model itself outputting negative predictions?

I’d appreciate any input!

1 Like

Hi @ForBo7 Have you tried adding one to your predictions before implementing the function? I often do that when working with logarithms.

2 Likes

Hello,

Most likely the issue is that `preds` holds values smaller than or equal to -1. Even with `log1p`, that would still be a problem because when `preds` contains, for instance, -3, then -3 + 1 = -2, whose logarithm is undefined on the real numbers. One solution is to either clamp the predictions with `preds = preds.clamp(min=0)` to ensure a minimum value of 0 or normalize them with fastai’s `sigmoid_range`, e.g., `preds = sigmoid_range(preds, 0, 10)`.

Alternatively, `preds` might contain NaNs, which can be checked by `preds.isnan().sum()`.

2 Likes

Checked for NaNs with `preds.isnan().sum()`, and there were none. So negative prediction are the culprit.

Before writing this post, I did consider clamping the predictions to be positive only because, as you said too, `log1p` wouldn’t help for negative values, but wanted to see if there was a better way.

The reason being because if I clamped, the clamped predictions wouldn’t be fed back into the model. But I suppose the model will figure that out itself from its calculated loss? Can’t use `sigmoid_range` however because the outputs are continuous.

1 Like

Yup! Did that with the `log1p` function which adds 1 for you.

Figured out from the help of BobMcDear’s comment that negative predictions are indeed the culprit.

Thanks to the power of backpropagation and PyTorch’s autograd engine, you need not worry about any problems arising from clamping. However, I’d recommend including the clamp operation in the model’s forward pass because it would, in a sense, become part of the network’s definition and not merely a step in the calculation of the loss function. That is, the model’s outputs would always have to be clamped to get reliable results, be it during inference or training.

Also, you can think of clamping values at 0 as the equivalent of ReLU - in fact, you could insert ReLU as the final layer of your network rather than clamping.

`sigmoid_range` scales a tensor’s content to a given range and is meant for continuous data.

1 Like ReLUs were indeed coming to mind as I was dabbling around this!

I’m using fastai to create my learner and have so far not specified any model to this learner. If I wanted to specify the use of the ReLU as an activation, would I have to create my own PyTorch model which I then input to the learner, or is there a function that would allow me to easily specify it?

I did find this `tabular_config` function in which I passed `torch.nn.ReLU` to the `act_cls` parameter, and then input this function to the learner. Didn’t seem to work though as I still got NaNs.

Oh, I see. Judging by the docs, I still don’t think it would work for my case? Because I would not know what the maximum value would be. Unless I just take the maximum value from my dataset?

Great, to learn that you got it. I seem to have missed that. To the best of my knowledge, fastai does not natively offer an option for specifying a final activation layer, but assuming the Learner is an instance of `TabularLearner` and fastai automatically constructed the model, appending ReLU to it is as straightforward as `learn.model.layers.append(nn.ReLU())`.

Yes, you could determine the maximum value from your dataset or prior domain knowledge. If you do decide to proceed with `sigmoid_range`, you can pass the desired range to the `y_range` argument of `TabularLearner`. Ultimately, you should experiment with both methods to evaluate the efficacy of each.

1 Like

learn.model.layers.append(nn.ReLU())

Ooo, that’ll be nifty to know.

I tried out `sigmoid_range` and got better performance using it! Good I tried it out afterall 