The Lesson 1 review of steps with corresponding code from the notebook

saidaspen · March 3, 2018, 10:43am

I am really trying to wrap my head around lesson 1 but have not yet fully gotten there yet.
As part of this process I tried to map out the review-list from the end of the notebook to the steps in the notebook.

Here are two things I do not understand in the Review: easy steps to train a world-class image classifier:

Should we do step 3 (training the last layer from precomputed activations) without augmented data? What is the purpose of that in this case? Don’t we just disregard the outcome of that training anyway if we then recreate the learn variable?
In step 7, finding the learning rate once again. This learning rate is really what we should be using for the last layer in steg 6. Right? So, step 6 and step 7 should really switch place?

Outline of steps and corresponding part of the notebook

Here is an outline of the steps from the Review: easy steps to train a world-class image classifier in the notebook and the code that each step correspond to. Someone else might find this helpful and if it is wrong, then please help correct it.

Enable data augmentation, and precompute=True

tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_paths(PATH, tfms=tfms)

And that we then use the precompute flag set to true:

learn=ConvLearner.pretrained(arch, data, precompute=True)

Use lr_find() to find highest learning rate where loss is still clearly improving
```
lrf=learn.lr_find()
learn.sched.plot_lr()
learn.sched.plot()
```
We use the plot to determine a good learning rate, let’s call it learning_rate.
Train last layer from precomputed activations for 1-2 epochs
Starting the training.
```
  learn.fit(learning_rate, 2)
```
Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
```
learn.precompute=False
learn.fit(learning_rate, 3, cycle_len=1)
```
Unfreeze all layers
```
learn.unfreeze()
```
Set earlier layers to 3x-10x lower learning rate than next higher layer
This might change depending on what your learning_rateis. Parametrized the one from the notebook.
```
lr=np.array([learning_rate*1e-2,learning_rate*1e-1,learning_rate])
```
Use lr_find() again
We look for the learning rate again, just as in step 2, using the plot and use this value in the previous step.
Train full network with cycle_mult=2 until over-fitting
We finally train the full network until over-fitting.
```
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)
```

erikalien · June 12, 2018, 6:54am

Hi, I’d like to share my ideas about your second problem.

I think the right order should be 7-6 rather than 6-7, because step 7 help us to make sure what is the initial magnitude of learning rate of the last layer, once the learning rate of the last layer is established, we should use 6 for SGDR.

That’s my ideas, welcome to discuss with me if you want.