Jupyter notebook explaining the 4 papers by Leslie N. Smith

kushaj · June 3, 2019, 9:45pm

The following papers by Leslie N. Smith are covered in this notebook :-

A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay. paper
Super-Convergence: Very Fast Training of Neural Networks Using Learning Rates. paper
Exploring loss function topology with cyclical learning rates. paper
Cyclical Learning Rates for Training Neural Networks. paper

This notebook covers all the topics discussed with theory as well as the fastai implementations of the relevant topics.

Table of Contents:

Summary of hyper-parameters
Hyper-params not discussed
Things to remember
Underfitting vs Overfitting
Deep Dive into Underfitting and Overfitting
1. Underfitting
2. Overfitting
Choosing Learning Rate
1. Cyclic Learning Rate (CLR) and Learning Rate Test
2. ResNet-56
3. Cyclic Learning Rate
4. Difference from Original paper
5. One-cycle policy summary
6. Learning rate finder test
Introducing Super-Convergence
1. Testing Linear Interpolation tests
2. How it was found in the first place?
3. Coding Linear INterpolation
Explanation behind Super-Convergence
Choosing Momentum
1. Some good values of momentum to test
Choosing Wight Decay
1. How to set the value
Train a final classifier model with above param values

There is cyclic momentum and weight decay left to cover, but seeing most of the stuff from the papers is covered I decide to share now.

Notebook link Reproducing Leslie N. Smith’s papers using fastai

matejthetree · June 4, 2019, 12:29am

excellent notebook
thank you

kushaj · June 4, 2019, 7:27pm

I updated the notebook with code for cyclic momentum and weight decay also. Almost everything is covered from the papers in the notebook.

One problem that I faced while implementing this was, I was not able to get the results of interpolation. In the exploring loss function paper, a topic of interpolation is discussed where it states that the minimas found by each cycle are different. But when I tested out for my code, I found the opposite, that there was no visible peak in the interpolation figure.