Why not precompute augmented data?

erikalien · June 12, 2018, 11:03am

When we want to implement data augmentation, we have to set learn.precompute=False, then to run learn.fit(1e-2, 3, cycle_len=1). That means in 3 epochs, we have to compute each activations in every layer every time. However, the activations in all but the last layers are the same, why don’t we just calculate these once, them use SGD to calculate the last layer 3 times, for example: replace learn.precompute=False
learn.fit(1e-2, 3, cycle_len=1)

by

learn.precompute=False
learn.fit(1e-2, 1)
learn.precompute=True
learn.fit(1e-2, 3, cycle_len=1)

radek · June 12, 2018, 12:03pm

They are not - agumentations are applied randomly to an image, meaning for one pass over the dataset an image might be rotated by 5 degrees, for another by 2 degrees, etc (up to the max amount of rotation you specify). This way each time our model sees a slightly different image - we artifically make our train set bigger. Unfortunately this means we need to recalculate all the activations starting from the bottom most layer.

Seb · June 12, 2018, 2:21pm

I guess we could run the same augmentations for multiple epochs, and in that case, precomputing activations would help. This might be a nice thing to experiment with, and I wonder if it is possible with the current version of fastai

erikalien · June 12, 2018, 2:49pm

You mean that we actually generate one larger training data for tuning weights and bias in each epoch, and in every epoch these training data are different due to data augmentation, right?

erikalien · June 12, 2018, 2:52pm

Can my last few lines of code can do the experiment of what you are talking about? If not, can you tell me the reason?

learn.precompute=False
learn.fit(1e-2, 1)

maybe the reason is that by running these two lines we can only train model rather than save the activations so that there is no pre-comuted activations (data-augmented version) for further training?

dhoa · June 12, 2018, 4:00pm

Not actually a larger training data but its transformation. For example: at epoch 1, the image is transform by rotating 5 degree. At epoch 2, it is rotated by 30 degree. The activation of these 2 images are different. Then you can not set precompute=True. If we fix the tranformation for every epoch so we can not leverage the advantage of augmentation. We want more kind of image to avoid overfiting. Hope that help

erikalien · June 12, 2018, 4:10pm

Very clear, I understand it now. Let me recap, data augmentation do is properly random deforming of original images in each epochs.

Thank you very much!