Pretrain vs. Precompute

Shahinfar · November 13, 2018, 5:35am

Dear Fast AI fellows,

I am a little bit confused in distinguishing pretrained vs. precompute in the arguments set of ConveLearner class.

Signature: ConvLearner.pretrained(f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, pretrained=True, **kwargs)

what are the differences of these two arguments. I can’t clearly conceptualize the difference of them. Also it gets even more confusing when you thinking about learn.freeze() in relation to them. I would appreciate if someone shed a light on it, in DL 101 level albeit

Best regards,
Shahin

marcmuc · November 13, 2018, 7:19am

All of this has been thoroughly discussed and explained here, please check out this thread:

Shahinfar · November 13, 2018, 11:08pm

Dear Marc,
Thanks for your reply. I have read that thread before. there are great explanations about precompute and freez but in that thread nobody has spoken about ‘pretrained’. My assumption is that “pretrained =True” is equal as “learn.freez()”. if my assumption is correct so my question is that why we essentially need two comand to do the exact same thing. of if they are doing different thing I would like to someone explain it to me what are exact differences ??

Cheers!
Shahin

cadolphs · November 14, 2018, 12:20am

pretrained = True is not equal to learn.freeze(), but there is a connection.
Basically, when you create a ConvLearner object, then under the hood the fastai library will create a neural network with a particular architecture (such as Resnet50).

The question now is: How should the weights of this neural network be initialized? For “general use”, you’d initialize the weights randomly, and then train the network on your training data. But for imagine recognition, people have already spent lots of time training architectures such as Resnet on huge training sets. When you use pretrained = True, you tell fastai that you would like to use the weights that people already found to work well for image recognition. This will, out of the box and without doing any training yourself, give you a neural network that will do a pretty good job at classifying the images from the ImageNet dataset, which contains dogs, cats, cars, trees, airplanes and a whole bunch of other stuff.

You then only have to finetune the network for your particular image classification task. The learn.freeze() tells the learner that you don’t want to touch the “deeper” layers of the network, because you believe that they’re already pretty good (because they are pretrained). Instead, you only want to train the final layer that you are using to turn the general image classifier into one specific to your own dataset (say, dogs vs cats, or healthy trees vs. sick trees)

Shahinfar · November 19, 2018, 11:18pm

Thank-you Clemens for your great explanation.
Just to make sure I am confidently understood every thing here is my final question:
precompute=True then is only related to the fully connected layers. if it is true it means that it will use pretrained weights for FC layers as well and if we set it to False we are obtaining those weights via epochs of learning.

Sincerely,
Shahin

cadolphs · November 20, 2018, 12:47am

It’s the opposite. precompute = True is related to the convolutional layers “higher up” in the network: If you do fine-tuning on a pre-trained neural network, you are only ever modifying (i.e. learning) the weights of the final FC layer. Now, if you do multiple epochs of learning, the neural network will be given the same images over and over, but from one epoch to the next ONLY the weights of the FC layer have changed. That means that whatever the neural network does with those images in the previous convolutional layers hasn’t changed from one epoch to the next.
So, we can speed up the whole thing by computing (and remembering) for each image of the training and validation sets what its activations in the convolutional layers were, and then we just need to feed those activations into the final FC layer for training, instead of running the image through the whole neural network.

Shahinfar · November 20, 2018, 1:51am

So considering both of your explanations, we can conclude that pretrain = True and precompute = True both are doing the same thing which is loading weights from an already trained model and letting the FC weights change during the learning epochs. Am I right?

Best regards,
Shahin

cadolphs · November 20, 2018, 2:31am

No. Pretrain = True will load weights from an already trained model, learn.freeze() tells the learner to only change the FC weights during learning epochs, and precompute = True will tell the learner that it should take each image of the training set and compute what the output of the frozen layers is for the image, and to store that and re-use it during training.

Maybe I can give you a simpler example using math notation. Let’s say you have a model that chains together two functions: y = f(g(x)). This means we first apply g to x, and then f to the output of g.

Now imagine that, in general, we have to learn both f and g. But someone already figured out a good way to do g. So we only need to train f. That means during learning, we change around what f actually does. But because someone already trained g, we never change what g does. So now our training set has LOTS of values for x. Instead of taking them every time and plugging them into g to compute g(x), we realize that we only need to do that once. So we have values x_1, x_2, x_3, ... and we use those to compute activations g(x_1), g(x_2), g(x_3), ... and then during training, whenever our data loader says "Okay, time to see what f(g(x_2)) is, we don’t have to compute g(x_2), we just plug in the precomputed value.

This only works because during learning only f gets changed, not g.

These things are interrelated in some way: First of all, if you don’t use pretrained AND freeze, it doesn’t make sense to use precompute, because in the example above, if we use precompute but then g changes because of training, then we can’t use the precomputed values anymore.

However, what can make sense is to first start with pretrain = True etc, but then once you’ve trained the model that way, you can then unfreeze the layers and allow the learner to tweak those pretrained layers a little bit as well, to get a bit of extra performance. Because maybe those pretrained weights were good, but not perfect, for your particular image task.

Shahinfar · November 20, 2018, 4:21am

Thanks a lot it is very clear now.
Best regards,
Shahin

parthi2929 · December 22, 2018, 9:27am

So afaik, trying to rewrite for my better undertsanding,

by default,

All layers except final is frozen, and pretrained = True. precompute = True or false, it does not matter.

If we instead choose to modify weights for all layers, then

unfreeze()
Set pretrained = False,

Now, if we want to re use same weights calculated for all layers, every time fit is called or model is trained, set precompute = True, else False. But precompute makes sense only when pretrained = False, and we unfreeze all layers.

Is my understanding correct?

If it is correct, why precompute option is not available in fastai 1.0?? I could not spot it in docs.