Whats the difference between Pre-Trained and Pre-Computed?

Renga · January 9, 2018, 4:26pm

I am quite confused with the ‘Pre-computed’ thingy.

Here we are training the ‘Pre-Trained’ Resnet.

So what is ‘Pre-computed’ then?

Results on Cats Vs Dogs with Pre-computed=True :99.03%
Result on ‘Pre-Compute=False’ = 99.121

I did run with both not doing rm {Path}tmp as well as doing rm {PATH} tmp. Results are all above 99%

ecdrid · January 9, 2018, 4:39pm

Pre computed is doing all the various computation once and for all…

Pre trained means that the model has already been trained on some data and we have layer weights with us … and we can use that directly or fine tune the couple of last layer According to our need…

Searching the forum for more insights on pre computation will help…

cynosure · January 9, 2018, 7:31pm

isnt it the other way round?

ecdrid · January 10, 2018, 12:33am

Actually they are kind of inter related and I find it hard to write it down…

cynosure · January 10, 2018, 6:51am

My understanding was that pre trained means the layers are designed in the optimal setting on a particular settings with all the parameters etc. But when using this model on a new data set the previuosly computed weights are not used but instead nee calculated.

In pre computed network design is pre designed (e.g vgg or resnet) and weights are also pre computed. Weights can be updated in new training of weights.

Is that wrong?

ecdrid · January 10, 2018, 7:06am

using pre-trained models we would not have to train the entire architecture but only a few layers.(from last)…

a pre-trained model is a model which is created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point.(baseline model)…

By using pre-trained models which have been previously trained on large datasets, we can directly use the weights and architecture obtained and apply the learning on our problem statement. This is what is popularly known as transfer learning.

So what we do might be the following…

Train some layers while freeze others – we can use a pre-trained model is by training it partially. What we can do is we keep the weights of initial layers of the model frozen while we retrain only the higher layers. We can try and test as to how many layers to be frozen and how many to be trained…

I might be wrong…

gokkulnath · January 10, 2018, 7:15am

Pre Computed : Just a Accelerated way to fine tune your last layer.

When Precompute=True. (First Run)
You Feed all your data in it and save the output just before the last layer as Bcolz array. So as you are running it first time it will take longer as it is saving these intermediate results.
(After First Run: Assuming you don’t delete data/tmp files)
When re run the same again , instead of computing these pre-activations again (Computing from the first layer) it will use these stored intermediate values directly and compute the results. (We overcome the redundant calculation that has to be done)

Pre-Trained: This refers to the weights. Ideally when you start solving a problem you either randomly initialize weights or use initialization techniques like Xavier Initialization. Instead of going through the pain of tuning the weights to optimal value on your own (Requires more compute time), We ideally perform transfer learning. (Reuse the weights).

Check out this forum to get more clarity on Pre-Computed: Link

Hope it Helps!

~Gokkul

harveyslash · January 10, 2018, 7:20am

Pre-trained Network:

A usually well established network that performs well on a dataset. Usually, the dataset that this network is trained on is much larger than the dataset you are currently working on. The first N layers (N is a hyper parameter) are used to extract ‘features’ of the data. Then these features are used to train other models.

Pre-computed (Activations):

In some tasks, the activations from the initial layers of a model dont change. Usually this is used along with Pre-trained networks. Since these activations dont change , the final activations are computed just once, and stored. Then these activations are used as input data. This is done to prevent having to compute redundant data over and over again.

They are used to compliment each other. Usually, the Activations from a Pre-trained network (to increase accuracy) are Precomputed(to increase speed).

Renga · January 10, 2018, 10:12am

I think if you watch lesson 2 it becomes abundantly clear. precompute is only the process of storing the activations of your training dataset for re-use. It just makes the subsequent (beyond the first) training faster. It affects nothing else but the speed of classification layer. In my view, its just a hack and we must not get fixated with it (I learnt it the hard way). In fact for more involved training like data augmentation process, we must turn ‘Precumpute’ to ‘False’ since if you turn it true, it ignores the augmented data. IMHO, we can altogether ignore this for now for lesson 1.

machinethink · January 10, 2018, 4:58pm

Another word you may see for precomputed activations is “bottleneck features”. In other words, it is the output of the last layer before the classifier.

Since only the classifier layers change during training but not the earlier layers, we can convert the images to feature vectors just once and then train the classifier on these feature vectors instead of on the images.

Renga · January 10, 2018, 5:44pm

Nicely put.

parthi2929 · December 22, 2018, 10:24am

@Renga There is no precompute option in fastai v1.0, so what do we do for augmentation?