Regularization

Ducky · March 30, 2018, 5:55pm

I am reading the recent Leslie Smith paper instead of doing my homework (oops), and am confused by some of the talk about regularization.

It’s my understanding that “regularization” is what you do to keep the weights nicely bounded so that they don’t explode or vanish. Weight decay and learning an term to scale the weights by make total sense to me in this context.

However, the Smith paper (and others) talk about many other things as regularizers, including larger learning rates, and smaller batch sizes.

Does anyone have some intuition as to why larger learning rate/smaller batch sizes should be regularizers?

(NB: another Smith paper shows that learning rate and batch size are inversely related, I’m happy to believe that if large learning rate is a regularizer, then so is small batch size.)

raspstephan · March 30, 2018, 6:01pm

How I understood it, larger learning rates, and in particular SGDR and cyclical LRs allow you to find broader minima.

Here is where Jeremy explains it I think: https://youtu.be/JNxcznsrRb8?t=32m52s

Ducky · March 30, 2018, 6:10pm

Sure, I totally get how learning rate will find you broader minima, but that’s generalization and not regularization. I don’t see how that helps you keep the weights nicely bounded.

Maybe there is a connection between generalization and regularization which I am missing?

raspstephan · March 30, 2018, 6:13pm

Oh, I wasn’t aware that regularization strictly referred to keeping the weights in check. I kind of though it was just a term to describe everything that helps prevent overfitting.

chunduri · March 30, 2018, 6:19pm

I think the idea of having a regularizing effect is a very broad term which includes all techniques to reduce overfitting.

How are learning rate and batch size inversely related?

Ducky · March 30, 2018, 6:23pm

Err, see the paper: [1711.00489] Don't Decay the Learning Rate, Increase the Batch Size
There is theory, which I think I would botch if I tried to explain it, but there is also experimental evidence which is what convinced me.

chunduri · March 30, 2018, 6:26pm

Thanks. I will look into it.