Does number of samples increase during Data Augmentation?

Ashik_Shafi · September 7, 2020, 11:53pm

When we use RandomResizedCrop as item_tfms, it will randomly crop and resize different parts of the images for every epoch we give. And when in dls.train.show_batch() we set unique=true to see the same image repeated with different versions. So is that means the number of samples/data will increase ? For example 1 image will be turned into 10 different versions of it ?

butchland · September 8, 2020, 12:32am

Hi @Ashik_Shafi,
The number of samples does not increase. What happens is that the augmentations are applied every time a sample is loaded. In the case of the Random Resize Crop which if called as an item tfms means that each sample is cropped randomly and then resized for each epoch. This means that the model doesn’t see the exact same image each epoch which improves generalization and prevents overfitting.

Ashik_Shafi · September 8, 2020, 12:53am

Thanks alot @butchland so the model just see the different version’s of the image for each epoch and the number of samples remains the same. It helps alot! kudos!

Sanitha · September 15, 2020, 3:02am

So the number of augmentation over a single image will be equal to number of epochs ? Just to clarify.

Ashik_Shafi · September 15, 2020, 3:17am

Nope, the model see’s the different version of augmentation in each epoch for instance 10 version of augmentation of single img in one epoch.

It doesnt affect in the size of the samples at any way. It just the model creates a augmented versions of the image and makes the model learns to increase its generalization.

Hope it answer!

yitao94 · September 15, 2020, 3:55am

Hi Sorry I don’t understand. How could the model know better if no more sample after the augmentation process?

Ashik_Shafi · September 15, 2020, 4:16am

What you mean by no more after the process ? Well in simple terms

When we perform augmentation tfms, it will perform the transformation on every image.
This doesnt affect the increase or decrease in samples.
the augmentation doesnt take place when there no image / samples exist.
The augmentation tfms shows different version of the image to the model.

Hope its bit clear now, dont overcomplicate it just a simple transformation is been done here to show different version of the image.

yitao94 · September 15, 2020, 4:25am

Thanks for your timely reply. for my original thought, e.g there were 1

Ashik_Shafi · September 15, 2020, 4:41am

When the transformation is done on one image different versions of the images will be shown. For Example:
download|275x183

Sanitha · September 15, 2020, 6:28am

Thanks @Ashik_Shafi. Well explained.

Sanitha · September 16, 2020, 2:50am

Something more to get clarified in image classification.
Please help:

With “aug_transforms” along with train data augmentation whether test data augmentation is also happening?
As fastai is giving this such flexibility for augmenting data, do we really need pre-processing data augmentation? That is whether we need to do data augmentation beforehand.
If in case for balancing the dataset if we are doing image augmentation as a pre-processing step to increase the number of samples, then applying ‘aug_transforms’ aswell , will it cause overfitting? It is likely that same type of image would get generate during Train data augmentation.
When applying aug_transforms, any specific count of the total number of different versions of a single image will be shown to the model during learning ,at each epoch ?

Thanks in advance.

muellerzr · September 16, 2020, 3:32am

Anything like warping, flipping, etc is not performed. And if a resize is done it is a center crop

No

Highly unlikely, unless you’re doing aug_transforms during the preprocessing with the exact same parameters, but even then. Each transform has an innate probability to be run, and if you modify what it’s hyperparameters are to slightly alter what’s being done beforehand, then they’re entirely different.

No “copies” of images are ever made, as has been stated before. An epoch is defined as going through all the samples of a dataset once. This should answer that pretty clearly.

Sanitha · September 16, 2020, 6:14am

Thanks [muellerzr] (https://forums.fast.ai/u/muellerzr) Zachary Mueller.
That helps a lot .