When I run
learn.fit_one_cycle, it tells me that the final validation loss is 0.19. But when I try to reproduce this by calling
CrossEntropyLossFlat(), it gives a different result! (1.55)
Why is this happening? What should I do to reproduce that 0.19 number?
Here is a colab notebook demonstrating this issue: Google Colaboratory
I played around with your notebook for a bit and realized the problem.
CrossEntropyLossFlat expects raw logits, but
preds had already been passed through a
softmax activation. I’ve edited your notebook to show you what I mean.
Here’s what I wrote in the notebook in case it gets changed later:
The logit word confusing me. I thought it was the result of a sigmoid.
Yeah logits is a weird word, I’m not sure what it stands for/comes from. I’ve always understood it to mean the outputs of a neural net before you apply the final activation function (like softmax or sigmoid).
yes this makes sense, I was confusing by the fact that
BCEWithLogitsLoss makes the sigmoid before, so I thought the meaning of the name was “bce with sigmoid”. Now I understand that it simply means “BCE applied to logits”.
Thanks so much!! This makes a ton of sense. I really love this forum, everyone is so helpful