precompute=True

sabzo · November 26, 2017, 8:56pm

Thank you for the amazing previous explanations.

I’ve two remaining questions:

When it comes to “weights” vs “activations” are they one and the same? Do weights create activations?
When you say “This transformation by activation of your data is done only one time and now the new values of your data can be used as inputs of the last layers of your new NN that you are about to train” – what transformations are you referring to?

creviera · November 26, 2017, 9:02pm

Hey, curious why data augmentation has to be impacted by whether or not we use precomputed activations. From what I’ve gathered, the point of data augmentation is to create more data (by cropping, zooming in, flipping, etc.) so that our network is less biased. So it seems like augmented data should be treated the same as the original images.

Why, when precompute=True, can’t we use the precomputed activations to train our augmented images the same way we do with the originals? You say it’s because the activation of our data is computed only one time, so why not that one time be with the precomputed activations?

creviera · November 26, 2017, 9:07pm

[Update] I just realized that this question has already been asked and answered. And I also now realize that using precomputed weights on the augmented images would not work in the same way as they do on the original images. So we would have to compute new weights for the new augmented images, which is why this cannot work with precompute=True

creviera · November 26, 2017, 9:39pm

Hey I just want to make sure my understanding is correct here. If we use lr_find we don’t need to call fit because they both call fit_gen under the hood (see https://github.com/fastai/fastai/blob/74bfb1d906c449beb5f25dbad31523e6d4b4c83d/fastai/learner.py#L101 and https://github.com/fastai/fastai/blob/74bfb1d906c449beb5f25dbad31523e6d4b4c83d/fastai/learner.py#L96). So doing something like this back-to-back makes little sense to me:

learn.lr_find()
learn.fit(1e-2, 6)

Unfortunately lr_find doesn’t have an epoch param and has epoch hard-coded as 1. So we can only do one epoch with lr_find. What if we want to use the optimal training rate for more than one epoch? Should we call lr_find in a loop for the number of epochs we want?

jeremy · November 26, 2017, 10:03pm

Nope! lr_find, as you noted, only does one epoch - and further, it doesn’t end at a good loss, because it intentionally increases the LR until it is too high. What’s more, it saves the weights before it starts, and resets them after it’s done.

So you absolutely need to call fit().

creviera · November 26, 2017, 11:48pm

So I think the weight of a CNN is called an activation. I think as far as CNNs are concerned weight is the same thing as kernel, convolution filter, feature matrix, and activation.

jeremy · November 27, 2017, 12:16am

Nearly, but one key issue: an ‘activation’ is the result of applying a function (such as a convolution). You may want to re-watch the videos where we go through the Excel spreadsheet, and have the spreadsheet in front of you as you watch - that’s where I define the term ‘activation’.

creviera · November 27, 2017, 3:59am

Aha! I wanted to link others to the part where you clarify the relationship between kernels, filters, convolutions, activations, layers, and weights:

The excel spreadsheet from the video is located here: https://github.com/fastai/fastai/blob/master/courses/dl1/excel/conv-example.xlsx

Notes:
activation = max(0, convolution) <–This is specifically an activation for a ReLU layer
convolution = sum(input pixel matrix * filter)
filter, aka kernel, is a slice of a 3D tensor and is the result of our training

Open questions:

Is it correct to call a ReLU layer a matrix of activations?
Is a fully-connected layer a matrix of the sum products of weights and activations?
Are weights only used in fully-connected layers?

jeremy · November 27, 2017, 4:36am

All layers can be tensors of any rank, so replace ‘matrix’ with ‘tensor’ above. Note that a fully connected layer is a particular type of sum product: specifically, a matrix product. Every layer consists of a set of input activations, a function, and a set of output activations. (Unless you include the input itself as a layer, I guess…)

Weights are used in convolutional layers too, and they are the individual parameters in each kernel (/ filter).

creviera · November 27, 2017, 4:46am

Got it, the output activations feed into the next layer as input, and the function is the activation function. And a filter, aka kernel, is a slice of a 3D tensor of weights. Thanks!

pierreguillou · November 27, 2017, 12:10pm

Hello @sabzo, I guess that the questions/answers between @creviera and @jeremy answered your own questions about weights, activations, precompute ?

sabzo · November 27, 2017, 7:18pm

They did. However re-watching Lesson 3 helped most significantly. Thanks.

neerjadoshi · December 28, 2017, 3:54pm

Correct me if I’m wrong, but what I understand from this is that setting precompute = False and passing the augmented data to the model takes into account the effect of augmentation. Thus activations will be computed from scratch for the new images. So can training using augmentation be done without unfreezing?
If this is the case, how does learn.unfreeze help here except for fine tuning?

jeremy · December 28, 2017, 4:26pm

That’s exactly right. You unfreeze layers in order to fine-tune them.

neerjadoshi · December 28, 2017, 5:00pm

Thanks!

neerjadoshi · December 28, 2017, 8:07pm

I’ve been trying out augmentation on the dogs vs cats dataset and it seems that not setting seed affects how augmentation changes the prediction accuracy. With lr=.01 and 1 epoch, at times there is an improvement in loss, training and validation accuracy, but a lot of the times, the model does much worse. (All this when I do not use unfreeze. On unfreezing of course there is an improvement). Is there any other explanation as to why augmentation isn’t a sure shot way to improve the model here?

jeremy · December 30, 2017, 11:45pm

I’d guess you’d need to decrease your LR or use SGDR to allow the model to learn to handle the augmented images.

Sohil · March 7, 2018, 3:41am

Out of curiosity, Can you explain why FreezeOut a waste of time?

Taman · March 14, 2018, 11:03pm

Reading through the whole thread has cleared up most of my confusions regarding precompute. Thanks!
I am just left with the following question:
Why does data augmentation need to be a dynamic process where the data changes slightly between epochs? Is it also possible to expand the dataset by including for example 5 augmented versions of each picture? Like that the dataset becomes 5 times bigger, has more variety and we can put precompute back on.

telarson · April 5, 2018, 9:42pm

@sermakarevich seems to be saying that if precompute = True data augmentation won’t work. Logically, this sounds right.

However in the lesson1.ipynb notebook the “Review” section says:

Review: easy steps to train a world-class image classifier

Enable data augmentation, and precompute=True

Use lr_find() to find highest learning rate where loss is still clearly improving

Is the first step above incorrect?