When I run learn.fit_one_cycle, it tells me that the final validation loss is 0.19. But when I try to reproduce this by calling learn.get_preds() and CrossEntropyLossFlat(), it gives a different result! (1.55)
Why is this happening? What should I do to reproduce that 0.19 number?
I played around with your notebook for a bit and realized the problem. CrossEntropyLossFlat expects raw logits, but preds had already been passed through a softmax activation. I’ve edited your notebook to show you what I mean.
Here’s what I wrote in the notebook in case it gets changed later:
Yeah logits is a weird word, I’m not sure what it stands for/comes from. I’ve always understood it to mean the outputs of a neural net before you apply the final activation function (like softmax or sigmoid).
yes this makes sense, I was confusing by the fact that BCEWithLogitsLoss makes the sigmoid before, so I thought the meaning of the name was “bce with sigmoid”. Now I understand that it simply means “BCE applied to logits”.
Thank you.