precompute=True

This is incorrect. Only unfreeze or freeze_to do that.

If we’re precomputing, we can’t use data augmentation (since the precomputed activations are for some specific input, whereas augmentation changes it every time).

16 Likes

ok…looks like I got it totally wrong. Let me try again.
Precompute = True/False - Precomputed activations that we feed into the network except for the last layer.
freeze/unfreeze - weights changes or not during the training. Is this correct?

Yes that’s reasonably correct, although some details are missing… E.g. freeze/unfreeze refer to all but the last layer, but there’s also freeze_to(idx) which freeze layers up to (but not including) layer number idx.

2 Likes

Yes, I noticed that.

Hi @jeremy You will be covering these things in detail in the upcoming sessions right?. I am not able to correlate things here in this thread because of no prior knowledge.

Did I understand it correctly?:

  • if we use precompute = True, than freeze / unfreeze do not work as we operate only on already calculated outputs of conv layers.
  • also data augmentation do not work
  • in order to use data augmentation and fine tune conv layers we need to start with precompute=False
  • in this case by default all conv layers are freezed
6 Likes

Yes, these discussions are amongst people who have read ahead in the notebooks :slight_smile:

All perfectly correct! (well… except in English we say ‘frozen’, not ‘freezed’ :wink: )

3 Likes

I guess I understood the logic behind turning on and off the parameters precompute and freeze. But I don’t understand what exactly are the “precomputed activations” and how they are generated. Are they generated based on the new data I’m training my model on (for example, catsvsdogs) or based on the data that the resnet35 model was trained on (imagenet)?

Also, what’s the difference between precomputed activations and precomputed weights? Are they the same?

Really appreciate if someone could help me understand that.

Thanks

5 Likes

@fsantos Precomputed activations are the outputs of activation function(relu) used in your case in each of the frozen layers(the layers that you don’t intend to train). They are calculated for the images used by your model as training set-- in your words, new data that you are training on. This helps speed up the training of the newly added fully connected layer at the end.
Also, Resnet model was trained on the different dataset – Imagenet-- and we only leverage the weights that were computed during that training. We refer these weights as precomputed weights.

3 Likes

Blockquote
guess I understood the logic behind turning on and off the parameters precompute and freeze. But I don’t understand what exactly are the “precomputed activations” and how they are generated. Are they generated based on the new data I’m training my model on (for example, catsvsdogs) or based on the data that the resnet35 model was trained on (imagenet)?

Precomputed activations are generated based on your data. They are the outputs of your model layers whose weights are frozen and hence their outputs won’t change. So they can be pre-computed to gain efficiency. Jermey mentioned almost 10x improvement in training the last un-frozen layer(s) through this technique.

Blockquote
Also, what’s the difference between precomputed activations and precomputed weights? Are they the same?

Precomputed weights are the weights for the various layers for a given standard model that you start with. These have been published through the authors of the various models over the years - trained on large datasets such as Imagenet or others. You can choose to retain all of them completely unchanged (by keeping those layers frozen) or you can retrain them on your data but with a much lower learning rate. Jermey mentioned using (1/10th LR for each successive group of previous layers if your dataset is similar to original dataset the model was trained on (e.g. cats/dogs) or 1/3rd if your dataset is different (e.g. satellite pictures or radiology pictures etc)

13 Likes

@fsantos Just to be a little more clearer – We precompute the activations because we know that we have frozen the earlier layers and don’t intend to train them. So, calculating their outputs in each epoch would be significant waste of time.

2 Likes

Thanks @rishubhkhurana and @sanjeev.b!

I perfectly understood now. The precompute is avoiding the model to recalculate the activation values for every image over all the previous frozen layers ( it calculates for the first time and saves in a temporary directory ). This way, we can just play around with the last (unfrozen) layers much faster. Such a smart trick…

6 Likes

Minor correction: we only precompute the activations of the penultimate layer of the network. Unfortunately doing all the layers is generally prohibitively storage intensive.

Sorry one more minor issue - we say “pretrained weights”, not “precomputed weights”.

6 Likes

@jeremy Yeah got it. Thanks for the clarity
So, we just store the activations of penultimate layer; but to arrive till that layer, we would still be computing the earlier layers activations at least once. And i guess that’s what we observe when we instantiate the learner for the first time – for example, 360/360 after instantiating the learner in lesson1 notebook.

And yeah, I meant to imply pre trained weights :slight_smile:

Wouldn’t we need to compute them at every epoch given that they are not stored (whereas we only compute the penultimate layer’s activations once)?

@runze We don’t need to compute earlier layers as we just need the output from penultimate layer as the input to our fully connected layer at the end.

Oh yeah, you are right.

Is there an equivalent to the keras model.summary() command in the fast.ai library? So we can know the right index to choose for freeze_to?

try running:

learn.model

Not quite as robust as keras’ model.summary().

2 Likes