Hi David, I found reviewing Lesson 2 notes very useful. Start from the bottom where the title is " What is the need for learning rate [1:41:32 ]"
Loss is basically how far away is our prediction compared to the actual value. For the above example in Lesson 2 where Jeremy is trying to fit a line to a bunch of dots, if you pass in x=2 and your model predicts that y should be 3 when the actual y value (where the dot is) is 4, then your loss can be calculated using those two values with the MSE equation. In the above post, there is a section titled " Loss function [1:28:35 ]" for further reference.
Loss does vary when you’re training it because LR varies. Fastai uses something called the ‘Cyclical Learning Rate’. This means that your learning rate starts off small, increases to a maximum LR (which you pass to fastai), and then decreases again. It’ll look tent shaped: /\
This cyclical LR affects losses so that the losses start off large, and then decrease. The intuition for doing this is that you want to explore the function space better in the beginning, that’s why you have an increasing LR. Then you decrease LR so that you can zone in on the more flat areas