Loss reported during training dramatically lower than after training

I’m training a multi-class classifer and the cost function is nll_loss (negative log likelihood loss). During training the loss reported on the training set slowly goes down from 0.50 to 0.12 if I train it long enough. The validation set also starts at 0.50 and slowly increases until it’s greater than 1.00. Classic overfitting, right?

But something was wrong since the F1-score on the training set was so much lower than on validation. If I run the whole training set through the torch.nn.functional.nll_loss function, I get a huge loss (greater than 3.00) that only seems to increase with training. I don’t see why this makes any sense.

So there are really two questions here. First, why is the training set loss reported after each epoch not roughly matching the loss if I run the whole training set through the loss function. Second, why might training set loss increasing with every epoch while validation set loss remains the same?

Here’s the code I’m running at the end on the whole training set

from fastai.model import predict
from fastai.core import V

trn_yhat = predict(learn0.model, learn0.data.trn_dl)
trn_ytrue = split_by_idx(train_idx, df, y)[1][0][:,None].astype(np.int64)
trn_preds = preds = np.argmax(trn_yhat, axis=1)


V1 = V(trn_ytrue)
V2 = V(trn_yhat)

to_np(F.nll_loss(V2, V1.squeeze(1)))

Would be grateful for any insight on why this is happening.

I figured it out, in case anyone stumbles upon this in the future. If you look at the above code, you’ll see it’s grabbing the trn_yhat predicted values by using the predict function, which iterates over the data loader. The trn_dl dataloader has shuffle=True set, so it will return the predictions in random order.

The trn_ytrue on the other hand, is splitting the full dataset neatly by the training and validation indices. So, of course, the predicted values and the true values will be totally mismatched. After all, the predicted values are randomized and the true values are in order.

The reason I was so thrown off here, is that the val_dl dataloader has shuffle=False, so the predicted values of y and the true values of y in the validation set will match properly and the calculated loss will make sense.

I ended up modifying the fastai.model.predict function and used this to generate the predicted and the true values for reach of the training and validation sets, which works just fine.

def predict(m, dl):
    m.eval()
    if hasattr(m, 'reset'): m.reset()
    res = []
    for *x,y in iter(dl): res.append([to_np(m(*VV(x))),to_np(y)])
    preda, y_true = zip(*res)
    return np.concatenate(preda), np.concatenate(y_true)

As an interesting aside, the reason I was seeing training loss decreasing durning training, but seemingly increasing with the number of epochs when I ran the loss after training, is that the model was fitting the training set better and better. So, when training was done and I proceeded to match the randomized training set predictions with ordered ground truth values, the loss was getting worse and worse because random matching increasingly contributes to loss on a better-fitted model.

Hope this helps anyone who runs into this someday.