I’m training a multi-class classifer and the cost function is nll_loss
(negative log likelihood loss). During training the loss reported on the training set slowly goes down from 0.50 to 0.12 if I train it long enough. The validation set also starts at 0.50 and slowly increases until it’s greater than 1.00. Classic overfitting, right?
But something was wrong since the F1-score on the training set was so much lower than on validation. If I run the whole training set through the torch.nn.functional.nll_loss
function, I get a huge loss (greater than 3.00) that only seems to increase with training. I don’t see why this makes any sense.
So there are really two questions here. First, why is the training set loss reported after each epoch not roughly matching the loss if I run the whole training set through the loss function. Second, why might training set loss increasing with every epoch while validation set loss remains the same?
Here’s the code I’m running at the end on the whole training set
from fastai.model import predict
from fastai.core import V
trn_yhat = predict(learn0.model, learn0.data.trn_dl)
trn_ytrue = split_by_idx(train_idx, df, y)[1][0][:,None].astype(np.int64)
trn_preds = preds = np.argmax(trn_yhat, axis=1)
V1 = V(trn_ytrue)
V2 = V(trn_yhat)
to_np(F.nll_loss(V2, V1.squeeze(1)))
Would be grateful for any insight on why this is happening.