Not incorrect, just a little confusing. Once you set precompute=False, the data augmentation will kick in.
Ah I was also very confused by the interplay between DA, freezing layers, and precomputing.
I’ll make a PR on github: https://github.com/fastai/fastai/pull/474
Does “Precompute=True” need more memory than “Precompute=False”?
Make me coufused again…
I think I understand what precompute does after reading on the different topics. I do have some questions on the finer details:
Am I right to say that precompute only saves run time if you are running more than 1 epoch?
When does the precomputation happen? Is it when we create the learn object or when we call learn.fit? Assuming the latter, does that mean we are precomputing the activations each time we call learn.fit (e.g. we are trying multiple learning rates, or if we use lr_find())
Do I understand correctly that there is some hidden logic that if precompute=true, then the specified data augmentation gets turned off? is there an explicit way to turn off data augmentation ?
What happens if I leave precompute=true and unfreeze part or all of the layers? I am guessing that we will precompute activations on the last frozen layer and deactivate data augmentation?
I will try to explain what exactly happens inside these 2 lines, based on what i learned from digging into the library code.
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz)) learn = ConvLearner.pretrained(arch, data, precompute=True)
Hope that will clarify some confusion about precompute and augmentation.
So, tfms_from_model(arch, sz) returns a tuple of transforms for train and validation data.
Transforms for the training data contain additional augmentations, random cropping etc.
Next, when we call ImageClassifierData.from_paths, under the hood it creates 6 separate datasets.
- Training data + training transforms
- Validation data + validatation transforms
- Training data + validation transforms
- Validation data + training transforms
- Test data + training transforms
- Test data + validation transforms
ConvLearner.pretrained(arch, data, precompute=True)
-this is where the precomputation happens.
After creating the model based on provided architecture, it precomputes the activations for trainig, validation and test data using datasets which were created earlier.
For training activations it uses (Training data + validation transforms), for validation (Validation data + validatation transforms), for test (Test data + validation transforms).
In addition to that, it creates a new instance of ImageClassifierData.
self.fc_data = ImageClassifierData.from_arrays(self.data.path, (act, self.data.trn_y), (val_act, self.data.val_y), self.data.bs, classes=self.data.classes, test = test_act if self.data.test_dl else None, num_workers=8)
It does not contains any training transforms. Now comes the important part, which i think creates confusion.
def data(self): return self.fc_data if self.precompute else self.data_
As you see if precompute flag is turned on, it will use fc_data instead of initial data object (with augmentations).
That is why Jeremy said - Once you set precompute=False, the data augmentation will kick in.
With respect to your questions:
- It happens once, when the learner object is created. Also it can recalculate activations inside set_data method.
- I guess you can see it now.
- unfreeze method sets precompute to False internally.
Hope that helps.
Please correct me if i got something wrong.
Thank you so much, this is very helpful!
If anyone has any doubt as to the usefulness of precompute=True; I suggest training a model with a deeper architecture (e.g. resnext101_64) on a slower GPU (for me it was a k80)!
On the dogs breed dataset, you can get a decent accuracy without data augmentations (and with precompute=True) in a few seconds. However, running data augmentation epochs take a lot longer, so it’s nice to have that head start!
Given that GPU limitation, I ended up spending more cycles with precompute =True (even using SGDR) and got 93% accuracy before heading to data augmentation.
I also reran the whole precompute =True multiple times to try different things out (e.g. different dropout values), which I wouldn’t have been able to do without precomputations (too slow)
Overfitting is a risk, but it helped that I used dropout=0.7
This thread is amazing. I was reading Chapter 5 in Chollet’s book on using pretrained VGG16 model and was puzzled over the same issue - why can’t I use data augmentation if I don’t train a pre-trained model end-to-end.
Searched Stackoverflow and finally found a user asking the same question, but was given a confusing, most likely ill-informed answer, which kept me skeptical. Found nothing else on the internet.
So I joined fastai forum, searched through a few threads, and am so glad I found this.