precompute=True

Thank you for the amazing previous explanations.

I’ve two remaining questions:

  1. When it comes to “weights” vs “activations” are they one and the same? Do weights create activations?

  2. When you say “This transformation by activation of your data is done only one time and now the new values of your data can be used as inputs of the last layers of your new NN that you are about to train” – what transformations are you referring to?

Hey, curious why data augmentation has to be impacted by whether or not we use precomputed activations. From what I’ve gathered, the point of data augmentation is to create more data (by cropping, zooming in, flipping, etc.) so that our network is less biased. So it seems like augmented data should be treated the same as the original images.

Why, when precompute=True, can’t we use the precomputed activations to train our augmented images the same way we do with the originals? You say it’s because the activation of our data is computed only one time, so why not that one time be with the precomputed activations?

1 Like

[Update] I just realized that this question has already been asked and answered. And I also now realize that using precomputed weights on the augmented images would not work in the same way as they do on the original images. So we would have to compute new weights for the new augmented images, which is why this cannot work with precompute=True

1 Like

Hey I just want to make sure my understanding is correct here. If we use lr_find we don’t need to call fit because they both call fit_gen under the hood (see https://github.com/fastai/fastai/blob/74bfb1d906c449beb5f25dbad31523e6d4b4c83d/fastai/learner.py#L101 and https://github.com/fastai/fastai/blob/74bfb1d906c449beb5f25dbad31523e6d4b4c83d/fastai/learner.py#L96). So doing something like this back-to-back makes little sense to me:

learn.lr_find()
learn.fit(1e-2, 6)

Unfortunately lr_find doesn’t have an epoch param and has epoch hard-coded as 1. So we can only do one epoch with lr_find. What if we want to use the optimal training rate for more than one epoch? Should we call lr_find in a loop for the number of epochs we want?

Nope! lr_find, as you noted, only does one epoch - and further, it doesn’t end at a good loss, because it intentionally increases the LR until it is too high. What’s more, it saves the weights before it starts, and resets them after it’s done.

So you absolutely need to call fit().

2 Likes

So I think the weight of a CNN is called an activation. I think as far as CNNs are concerned weight is the same thing as kernel, convolution filter, feature matrix, and activation.

Nearly, but one key issue: an ‘activation’ is the result of applying a function (such as a convolution). You may want to re-watch the videos where we go through the Excel spreadsheet, and have the spreadsheet in front of you as you watch - that’s where I define the term ‘activation’.

1 Like

Aha! I wanted to link others to the part where you clarify the relationship between kernels, filters, convolutions, activations, layers, and weights:

The excel spreadsheet from the video is located here: https://github.com/fastai/fastai/blob/master/courses/dl1/excel/conv-example.xlsx

Notes:
activation = max(0, convolution) <–This is specifically an activation for a ReLU layer
convolution = sum(input pixel matrix * filter)
filter, aka kernel, is a slice of a 3D tensor and is the result of our training

Open questions:

  • Is it correct to call a ReLU layer a matrix of activations?
  • Is a fully-connected layer a matrix of the sum products of weights and activations?
  • Are weights only used in fully-connected layers?
1 Like

All layers can be tensors of any rank, so replace ‘matrix’ with ‘tensor’ above. Note that a fully connected layer is a particular type of sum product: specifically, a matrix product. Every layer consists of a set of input activations, a function, and a set of output activations. (Unless you include the input itself as a layer, I guess…)

Weights are used in convolutional layers too, and they are the individual parameters in each kernel (/ filter).

Got it, the output activations feed into the next layer as input, and the function is the activation function. And a filter, aka kernel, is a slice of a 3D tensor of weights. Thanks!

1 Like

Hello @sabzo, I guess that the questions/answers between @creviera and @jeremy answered your own questions about weights, activations, precompute ?

They did. However re-watching Lesson 3 helped most significantly. Thanks.

1 Like

Correct me if I’m wrong, but what I understand from this is that setting precompute = False and passing the augmented data to the model takes into account the effect of augmentation. Thus activations will be computed from scratch for the new images. So can training using augmentation be done without unfreezing?
If this is the case, how does learn.unfreeze help here except for fine tuning?

1 Like

That’s exactly right. You unfreeze layers in order to fine-tune them.

1 Like

Thanks!

I’ve been trying out augmentation on the dogs vs cats dataset and it seems that not setting seed affects how augmentation changes the prediction accuracy. With lr=.01 and 1 epoch, at times there is an improvement in loss, training and validation accuracy, but a lot of the times, the model does much worse. (All this when I do not use unfreeze. On unfreezing of course there is an improvement). Is there any other explanation as to why augmentation isn’t a sure shot way to improve the model here?

I’d guess you’d need to decrease your LR or use SGDR to allow the model to learn to handle the augmented images.

Out of curiosity, Can you explain why FreezeOut a waste of time?

Reading through the whole thread has cleared up most of my confusions regarding precompute. Thanks!
I am just left with the following question:
Why does data augmentation need to be a dynamic process where the data changes slightly between epochs? Is it also possible to expand the dataset by including for example 5 augmented versions of each picture? Like that the dataset becomes 5 times bigger, has more variety and we can put precompute back on.

1 Like

@sermakarevich seems to be saying that if precompute = True data augmentation won’t work. Logically, this sounds right.

However in the lesson1.ipynb notebook the “Review” section says:

Review: easy steps to train a world-class image classifier

  1. Enable data augmentation, and precompute=True
  2. Use lr_find() to find highest learning rate where loss is still clearly improving

Is the first step above incorrect?