Hi All

In the `fit`

function in model.py, there are a few lines:

```
avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
debias_loss = avg_loss / (1 - avg_mom**batch_num)
```

Does anyone have a motivation for this transformation? It computes a debiased EMA of the loss (maybe inspired by this?) – but I don’t understand why you’d want to do that (esp. by default)

Specifically, I’m wondering about whether this is appropriate to use w/ the `lr_find`

function. These are `lr_find`

plots w/ using the EMA (as above) or the raw loss, respectively:

Raw loss is noisier (obviously), but the location of the minima are clearly shifted – about 2.0 for the EMA vs about 0.2 for the raw loss. I saw in the videos that @jeremy suggested finding the minimum and then going back about an order of magnitude – is this maybe because the EMA “delays” the minimum?