I am really trying to wrap my head around lesson 1 but have not yet fully gotten there yet.
As part of this process I tried to map out the review-list from the end of the notebook to the steps in the notebook.
Here are two things I do not understand in the Review: easy steps to train a world-class image classifier:
-
Should we do step 3 (training the last layer from precomputed activations) without augmented data? What is the purpose of that in this case? Don’t we just disregard the outcome of that training anyway if we then recreate the
learn
variable? -
In step 7, finding the learning rate once again. This learning rate is really what we should be using for the last layer in steg 6. Right? So, step 6 and step 7 should really switch place?
Outline of steps and corresponding part of the notebook
Here is an outline of the steps from the Review: easy steps to train a world-class image classifier in the notebook and the code that each step correspond to. Someone else might find this helpful and if it is wrong, then please help correct it.
-
Enable data augmentation, and precompute=True
tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1) data = ImageClassifierData.from_paths(PATH, tfms=tfms)
And that we then use the precompute flag set to true:
learn=ConvLearner.pretrained(arch, data, precompute=True)
-
Use
lr_find()
to find highest learning rate where loss is still clearly improvinglrf=learn.lr_find() learn.sched.plot_lr() learn.sched.plot()
We use the plot to determine a good learning rate, let’s call it
learning_rate
. -
Train last layer from precomputed activations for 1-2 epochs
Starting the training.learn.fit(learning_rate, 2)
-
Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
learn.precompute=False learn.fit(learning_rate, 3, cycle_len=1)
-
Unfreeze all layers
learn.unfreeze()
-
Set earlier layers to 3x-10x lower learning rate than next higher layer
This might change depending on what yourlearning_rate
is. Parametrized the one from the notebook.lr=np.array([learning_rate*1e-2,learning_rate*1e-1,learning_rate])
-
Use
lr_find()
again
We look for the learning rate again, just as in step 2, using the plot and use this value in the previous step. -
Train full network with cycle_mult=2 until over-fitting
We finally train the full network until over-fitting.learn.fit(lr, 3, cycle_len=1, cycle_mult=2)