When it comes to “weights” vs “activations” are they one and the same? Do weights create activations?
When you say “This transformation by activation of your data is done only one time and now the new values of your data can be used as inputs of the last layers of your new NN that you are about to train” – what transformations are you referring to?
Hey, curious why data augmentation has to be impacted by whether or not we use precomputed activations. From what I’ve gathered, the point of data augmentation is to create more data (by cropping, zooming in, flipping, etc.) so that our network is less biased. So it seems like augmented data should be treated the same as the original images.
Why, when precompute=True, can’t we use the precomputed activations to train our augmented images the same way we do with the originals? You say it’s because the activation of our data is computed only one time, so why not that one time be with the precomputed activations?
[Update] I just realized that this question has already been asked and answered. And I also now realize that using precomputed weights on the augmented images would not work in the same way as they do on the original images. So we would have to compute new weights for the new augmented images, which is why this cannot work with precompute=True
Unfortunately lr_find doesn’t have an epoch param and has epoch hard-coded as 1. So we can only do one epoch with lr_find. What if we want to use the optimal training rate for more than one epoch? Should we call lr_find in a loop for the number of epochs we want?
Nope! lr_find, as you noted, only does one epoch - and further, it doesn’t end at a good loss, because it intentionally increases the LR until it is too high. What’s more, it saves the weights before it starts, and resets them after it’s done.
So I think the weight of a CNN is called an activation. I think as far as CNNs are concerned weight is the same thing as kernel, convolution filter, feature matrix, and activation.
Nearly, but one key issue: an ‘activation’ is the result of applying a function (such as a convolution). You may want to re-watch the videos where we go through the Excel spreadsheet, and have the spreadsheet in front of you as you watch - that’s where I define the term ‘activation’.
Notes: activation = max(0, convolution) <–This is specifically an activation for a ReLU layer convolution = sum(input pixel matrix * filter) filter, aka kernel, is a slice of a 3D tensor and is the result of our training
Open questions:
Is it correct to call a ReLU layer a matrix of activations?
Is a fully-connected layer a matrix of the sum products of weights and activations?
All layers can be tensors of any rank, so replace ‘matrix’ with ‘tensor’ above. Note that a fully connected layer is a particular type of sum product: specifically, a matrix product. Every layer consists of a set of input activations, a function, and a set of output activations. (Unless you include the input itself as a layer, I guess…)
Weights are used in convolutional layers too, and they are the individual parameters in each kernel (/ filter).
Got it, the output activations feed into the next layer as input, and the function is the activation function. And a filter, aka kernel, is a slice of a 3D tensor of weights. Thanks!
Correct me if I’m wrong, but what I understand from this is that setting precompute = False and passing the augmented data to the model takes into account the effect of augmentation. Thus activations will be computed from scratch for the new images. So can training using augmentation be done without unfreezing?
If this is the case, how does learn.unfreeze help here except for fine tuning?
I’ve been trying out augmentation on the dogs vs cats dataset and it seems that not setting seed affects how augmentation changes the prediction accuracy. With lr=.01 and 1 epoch, at times there is an improvement in loss, training and validation accuracy, but a lot of the times, the model does much worse. (All this when I do not use unfreeze. On unfreezing of course there is an improvement). Is there any other explanation as to why augmentation isn’t a sure shot way to improve the model here?
Reading through the whole thread has cleared up most of my confusions regarding precompute. Thanks!
I am just left with the following question:
Why does data augmentation need to be a dynamic process where the data changes slightly between epochs? Is it also possible to expand the dataset by including for example 5 augmented versions of each picture? Like that the dataset becomes 5 times bigger, has more variety and we can put precompute back on.