Choosing a learning rate for noisy gradients

logarithm · December 4, 2020, 8:58pm

I have a model with an incredibly noisy gradient, how can I find the best learning rate here?

ilovescience · December 4, 2020, 9:50pm

What is your batch size?

logarithm · December 4, 2020, 9:52pm

my batch size is 8

ilovescience · December 4, 2020, 9:56pm

That’s probably too low. With a small batch size, the gradient update will vary a lot from batch to batch. You see the same issue here. You probably want a higher batch size of 32 or 64.

vferrer · December 6, 2020, 9:01am

Use GradientAccum callback to accumulate gradients. Just remember that it only accumulates gradients. So, if your model use batch normalization layers, a batch size of 8 could be too low. You can change batch norm layers for another type as instance batch norm, etc or try to increase your batch size using mixed precision training, inplace operations or a smaller network,