Average loss for all epochs

At this timestamp @jeremy show how he calculates the “global loss” for all the epochs which is given by this line:

avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)

with initial values of:

avg_mom=0.98
batch_num,avg_loss=0,0.

I was just wondering: Is this way of calculating the global loss (over all the epochs) is better than just averaging all the loss? And moreover, is this technique generalizable? For instance if I use Adam/RMSProp/Adamax as optimizer, will it work? (considering RMSProp default momentum is 0).
I always use an “Average meter” to calculate this global loss, like this:

class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

And I used it like so:

for epoch in range(...):
    [...]
    # forward
    logits = self.net.forward(inputs)

    # backward + optimize
    loss = loss_fnc(logits, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    losses.update(loss.data[0], batch_size)
    logs = metrics_list(targets, logits)
[...]
return losses.avg

But I wasn’t sure if that was the right way to calculate the global loss. Any thoughts on this?

1 Like

The benefits of the way I do it over averaging the whole epoch is that the later batches get more weight. The benefit over just showing the minibatch loss are that it’s more stable.

2 Likes

Gotcha, make sense :slight_smile:

But why choosing avg_mom=0.98? Why this value?
Thanks :slight_smile:

'cos it seemed to work pretty well :slight_smile:

(Sorry - must be frustrating having an engineer rather than a real scientist as a teacher… :wink: )

3 Likes

Aha okay. Well lets assume it’s the case then :sweat_smile: . Don’t worry I also have an engineer background but understanding the reason behind every decision is a must as when it stops working you find yourself not understanding why it is happening.