1cycle policy and adaptive batch sizes

I’m curious how to think about the 1-cycle policy wrt batch sizes that change during training.

When batch sizes change due to progressive growing or due to gradient noise scale adaptation, how should the 1-cycle policy change? For example, one could follow the linear proportionality rule and scale the (max) LR according to the proportional change in batch size.

1 Like