After calling fit , the results after each epoch are printed, showing the epoch number, the training and validation set losses (the “measure of performance” used for training the model), and any metrics you’ve requested (error rate, in this case).

I am a little confused and here are my doubts:

As per the text above, after each epoch loss on the training set, validation set is calculated. Shouldn’t the model FIRST calculate the MINIMUM loss on training set letting the optimizer (for eg SGD) to reach a global minima and once you have the **minimum loss ** on training data thats when the loss on validation set should be calculated.

won’t the model memorize weights if we calculate training loss and validation loss after each epoch…leading to overfitting.

Answer to your 1st query: epoch means one complete set of data has been pass through the model. Training loss = Loss calculated on training data w.r.t expected output
Validation loss = Loss on unseen data
The optimizer will try to minimize the training loss with the help of finding the minima of loss funtion after the first epoch.

Answer to your 1st query: epoch means one complete set of data has been pass through the model. Training loss = Loss calculated on training data w.r.t expected output
Validation loss = Loss on unseen data
The optimizer will try to minimize the training loss with the help of finding the minima of loss funtion after the first epoch.

this is where I have a disconnect from what’s mentioned in the text. After each epoch its “ok” to calculate the “loss” on training dataset and for optimizier to keep looking for a global minima.

But what I dont understand is what’s the need to calculate the “loss” on validation set here (during the training of the model)

Validation set loss should only be calculated when the optimizer is done finding the global minima and the model comes with the “best” set of weights :

for the given problem

for the given data

even in machine learning we calculate loss on validation set after training the model and not during the training

Validation loss is the criteria whether your model is good enough or not (As is it tested on unseen data)
While comparing to ML, I can give a vague idea as you can assume that in ML you train your model for only one epoch whereas in DL you do it for more and after every epoch, you check whether my validation loss is going down or not. I hope its clear now??

There’s a big difference between simply calculating the loss, and applying the gradients of that loss. Our loss function is simply some mathematical representation of whether it’s close to our minima, but we need to call loss.backward() for anything to get applied. When we calculate the loss on the validation set, since it’s called without gradients (in torch terms that’s with torch.no_grad():), there are no gradients we can back propagate, and so we never actually call it.

Here’s perhaps another way to think on it:

Training Loss: Performs backprop on the end of each epoch and/or batch

Validation Loss: A “metric” run on the validation set, to see how the loss relationship between train and validation is sitting. No back propagation, simply a view into model performance for you

Metrics: Similar to the validation loss, it’s just a human-readable way to view model performance.

Hopefully that helps answer your question, and if not I’m sure @arora_aman will be able to quell anything else

A very basic torch loop setup is like so:

loss_func = CrossEntropyLossFlat()
metric = accuracy
train_loss, valid_loss, metric_num = 0,0, 0
for x,y in train_dl:
out = model(x)
loss = loss_func(out, y)
loss.backward()
train_loss += loss.detach().cpu()
for x,y in valid_dl:
out = model(x)
loss = loss_func(out, y)
metric_val = metric(out, y)
valid_loss += loss.cpu()
metric_num += metric_val

So you can see here we never call loss.backward() in the validation step, and fastai does a very similar thing