Loss Function Confusion

When I run learn.fit_one_cycle, it tells me that the final validation loss is 0.19. But when I try to reproduce this by calling learn.get_preds() and CrossEntropyLossFlat(), it gives a different result! (1.55)

Why is this happening? What should I do to reproduce that 0.19 number?

Here is a colab notebook demonstrating this issue: Google Colaboratory

1 Like

Hi @vdefont,

I played around with your notebook for a bit and realized the problem. CrossEntropyLossFlat expects raw logits, but preds had already been passed through a softmax activation. I’ve edited your notebook to show you what I mean.

Here’s what I wrote in the notebook in case it gets changed later:

1 Like

The logit word confusing me. I thought it was the result of a sigmoid.

Yeah logits is a weird word, I’m not sure what it stands for/comes from. I’ve always understood it to mean the outputs of a neural net before you apply the final activation function (like softmax or sigmoid).

1 Like

yes this makes sense, I was confusing by the fact that BCEWithLogitsLoss makes the sigmoid before, so I thought the meaning of the name was “bce with sigmoid”. Now I understand that it simply means “BCE applied to logits”.
Thank you.

1 Like

Thanks so much!! This makes a ton of sense. I really love this forum, everyone is so helpful :grinning:

1 Like