Research Paper Recommendations

Fantastic. This attempt to see how pieces of the training puzzle interact is really insightful, and scarce.

His proxy for superconvergence, learning rate policy “1cycle”, is easy to replicate with fast.ai cycle_len and cycle_mult, (it implies always using only two cycles, though) quoting:

Here we suggest a slight modification of cyclical learning rate policy for super-convergence; always
use one cycle that is smaller than the total number of iterations/epochs and allow the learning rate to
decrease several orders of magnitude less than the initial learning rate for the remaining iterations.
We named this learning rate policy “1cycle” and in our experiments this policy allows the accuracy
to plateau before the training ends.

Also regularizing effect of lr size and bach size, sample + parameter dependency of regularization…
it is all there!

Even if you implement a Bayesian optimizer, that I have used in the past, you still need to
understand as much as possible this interactions to tune that optimizer, imo this is priceless.

Thank you for sharing! :grinning: