Can someone explain precompute = True and freeze or unfreeze?

I am not able to get my head around what happens when you do precompute=true and when you freeze or unfreeze layer? (these are 2 separate questions)

1 Like

Freezing and unfreezing is relatively straightforward. If a net is frozen, only the weights for the last layer(s) are actually updated (learned). The “core” is not changed. So if you use ImageNet as a base and learn cats/dogs on top of it, only the cat/dog relevant layer(s) gets updated.
If you unfreeze you can update all the weights, so ImageNet itself also gets updated. Usually you only want to update a very tiny bit in the early layers (those find corners and general things), a little more towards the middle and most towards the end (the task-specific stuff).
I think this is a little unprecise but hopefully gets the basic idea across.

Precompute is basically a form of caching. If it confuses you, you can always turn it off completely (and take the performance hit).

8 Likes

Thank you for explaining freezing and unfreezing. Seems much clearer now.

A follow up.

Does freezing work on the MLP part of the CNN and precompute store the activations from the convolutions?

No. AFAIK:

Precompute = True do store/use precomputed activations for/from the entire network.

Indeed, if you unfreeze the early layers, you don’t use precomputed=True.
Also you don’t use precomputed activations during prediction.

If you train the network with just the last layers unfrozen, then precompute, either true or false, will affect just these layers.

Be aware that I wrote AFAIK above.

Thank you for this answer. I am also struggling with freeze_to(layer) command. How do I get the number of the layer from the model?

I reckon there would be better ways, but try with model.summary().

I did not think about that. Thanks

1 Like

@nchukaobah just to add on to @balnazzar answer, you could use learn.model to see the model. From what I understand the freeze_to option works on layer groups so it might be worthwhile to see learn.get_layer_groups as well.

1 Like

I’ll try that too. Thank you

So, does unfreezing allow to do “a very tiny bit in the early layers (those find corners and general things), a little more towards the middle and most towards the end (the task-specific stuff)”? Or it recomputes all ImageNet?

Unfreezing allows the network to make changes to the weights of the layers at all. By default the pretrained part (the body or “backbone”) is frozen and we only train (change the weights) of the last layers (the custom head) that we added.
Then we unfreeze the body (enabling any training of that at all)
The way the changes to the weights are made (tiny, medium,large as you put it) are controlled by the learning rate. So the learning rate is set to very small for the early layers (no need to change much), medium for the middle, full learning rate for the head - by choice.

2 Likes

Thank you! Much clearer!

I struggled with pre-compute for a while. Here is my understanding:

When doing transfer learning we typically freeze the first n layers and leave the last layer unfrozen to be able to update the weights.

If the first N layers are frozen that means that if we put an image through it during the first epoch and we put the same image through it again during the second epoch then we will get out the same value through that layer.

Consider a network that has 2 layers. The first layer is frozen and the second layer not frozen. If we run 100 epochs we are doing an identical computation through the first layer for each of the 100 epochs. We run the same images through the same layers without updating the weights. this means for every epoch the inputs to the first layer are the same(the images). The weights in the first layer are the same and the outputs from the first layer are the same(images * weights + bias).

so instead of running that same calculation for each epoch we only run it once. then we feed that output into layer 2. That output is also called the activation.

Hope that helps.

2 Likes

This description of precompute=True/False makes sense to me and explains the behavior.
Question though, where are these pre-computed values stored?

In a file(s) in /your_directory/tmp

Soon it will no longer be relevant tough. In fastai v1, they got rid of precompute.

Current me if I’m wrong. From your explanation I understand that the pre-compute function calculates the activation for the frozen layers in order to speed to the tuning of the unfrozen layers?