Visualizing the "fit, unfreeze, fit again" process. Which layers are affected?

takenbythedesert · July 3, 2020, 3:39pm

I’m going through Lessons 2 and 3. Having trouble visualizing what goes on during the “fit, unfreeze, fit again” process. Which layers are being acted on? Which parameters/weights are being updated?

When I run fit_one_cycle(4), does that run my data through the 34 layers of Resnet34 four times? And does it update all the parameters/weights during each pass?
When I run unfreeze() does that unfreeze all 34 layers, eliminating all their parameters/weights (which I assume are the defaults that come with the model)? If not, what is it unfreezing?
When I run learn.fit_one_cycle(10, max_lr=slice(1e-5,1e-3)) (the second fit with the better learning rate, after running unfreeze()), where is my learner at this stage of the process? Is this second stage adding to what was learned in stage 1? Or are we overwriting all the parameters/weights, meaning stage-2 is not an additive stage, but an entirely new copy of the learner with different parameters/weights?

utkb · July 3, 2020, 6:17pm

Yes, but not all the layers will be trainable. See below.

Not all params, it only updates the trainable params. See below.

You can see what’s trainable and what’s not trainable, when it’s frozen or unfrozen, by checking the model_summary (see docs here) output. For example, when it’s frozen (default starting point for pretrained network), you can run model_summary and see which “layers” of params are trainable. Then you unfreeze, and run model_summary again, and you can see the difference. The fastai defaults already helpfully ‘group’ layers together, to best apply discriminative layer training. I am not sure what you meant by ‘eliminating all their parameters’. Nothing is ‘eliminated’, it’s just whether the values are updated by the training epochs or not.

‘Stage 2’ follows from what you mentioned as ‘stage 1’. It is “additive” in the sense that the trainable params are updated by adding (it’s actually subtracting because of the sign of the slope) learning_rate×gradient. And it’s “overwriting” in the sense that the trainable params will be updated (i.e. overwritten) during every training epoch. Hence, you could say that “stage 2” is “additive” in the sense that it builds on the params and model state after “stage 1” training. At the same time, you could also say that it is an “entirely new copy” in the sense that all the trainable params have been updated/overwritten by the training epochs…!

With the ‘top down’ teaching of fastai course, if you note down all these questions of yours now, and then just soldier on through the next lessons, later on these details will be explained in more granularity – discriminative layer training with different learning rates, stochastic gradient descent (SGD), etc. Then you can refer back to these questions/notes of yours, and if you are still not clear about what some of them mean, feel free to hop back onto the forum and ask again : )

Yijin

wgpubs · July 3, 2020, 10:35pm

After you create your learner run learn.summary() to see parameters and what is frozen (layers that will not be updated) / unfrozen (layers that will be frozen).

Then run learn.summary() after you run learn.freeze(), learn.unfreeze(), learn.freeze_to(-2) to see that same information.

takenbythedesert · July 8, 2020, 5:02pm

Extremely helpful, thank you both!