Doubt about the order in which training set loss & validation set loss is calculated

Hello,

I came across this text in the fastai book:

After calling fit , the results after each epoch are printed, showing the epoch number, the training and validation set losses (the “measure of performance” used for training the model), and any metrics you’ve requested (error rate, in this case).

I am a little confused and here are my doubts:

  1. As per the text above, after each epoch loss on the training set, validation set is calculated. Shouldn’t the model FIRST calculate the MINIMUM loss on training set letting the optimizer (for eg SGD) to reach a global minima and once you have the **minimum loss ** on training data thats when the loss on validation set should be calculated.

  2. won’t the model memorize weights if we calculate training loss and validation loss after each epoch…leading to overfitting.

Thanks

Answer to your 1st query: epoch means one complete set of data has been pass through the model. Training loss = Loss calculated on training data w.r.t expected output
Validation loss = Loss on unseen data
The optimizer will try to minimize the training loss with the help of finding the minima of loss funtion after the first epoch.

Coming to your next question … we are calculation loss so that it can be updated with an optimizer. Overfitting will occur when validation loss is not decreasing and training loss is going down. please refer this https://wb-forum.slack.com/archives/C023P7TM1DK/p1624272803306000?thread_ts=1624263184.301900&cid=C023P7TM1DK

Answer to your 1st query: epoch means one complete set of data has been pass through the model. Training loss = Loss calculated on training data w.r.t expected output
Validation loss = Loss on unseen data
The optimizer will try to minimize the training loss with the help of finding the minima of loss funtion after the first epoch.

this is where I have a disconnect from what’s mentioned in the text. After each epoch its “ok” to calculate the “loss” on training dataset and for optimizier to keep looking for a global minima.

But what I dont understand is what’s the need to calculate the “loss” on validation set here (during the training of the model)

Validation set loss should only be calculated when the optimizer is done finding the global minima and the model comes with the “best” set of weights :

  1. for the given problem

  2. for the given data

even in machine learning we calculate loss on validation set after training the model and not during the training

Validation loss is the criteria whether your model is good enough or not (As is it tested on unseen data)
While comparing to ML, I can give a vague idea as you can assume that in ML you train your model for only one epoch whereas in DL you do it for more and after every epoch, you check whether my validation loss is going down or not. I hope its clear now??

Uh… I think you’re missing the point on what loss is. That’s okay! I will touch base on this tomorrow in our lecture and hopefully that will help.

1 Like

There’s a big difference between simply calculating the loss, and applying the gradients of that loss. Our loss function is simply some mathematical representation of whether it’s close to our minima, but we need to call loss.backward() for anything to get applied. When we calculate the loss on the validation set, since it’s called without gradients (in torch terms that’s with torch.no_grad():), there are no gradients we can back propagate, and so we never actually call it.

Here’s perhaps another way to think on it:

  • Training Loss: Performs backprop on the end of each epoch and/or batch
  • Validation Loss: A “metric” run on the validation set, to see how the loss relationship between train and validation is sitting. No back propagation, simply a view into model performance for you
  • Metrics: Similar to the validation loss, it’s just a human-readable way to view model performance.

Hopefully that helps answer your question, and if not I’m sure @arora_aman will be able to quell anything else :smiley:

A very basic torch loop setup is like so:

loss_func = CrossEntropyLossFlat()
metric = accuracy
train_loss, valid_loss, metric_num = 0,0, 0
for x,y in train_dl:
  out = model(x)
  loss = loss_func(out, y)
  loss.backward()
  train_loss += loss.detach().cpu()

for x,y in valid_dl:
   out = model(x)
   loss = loss_func(out, y)
   metric_val = metric(out, y)
   valid_loss += loss.cpu()
   metric_num += metric_val

So you can see here we never call loss.backward() in the validation step, and fastai does a very similar thing :slight_smile:

2 Likes