Image Data Transforms Overview

I do not understand the the big picture of image data augmentation, transforms. I dug into the code on GitHub, but I have not been successful tracing the code to discover what is going on. First, I will document my current understanding and then ask questions. Please correct my understanding if wrong, and then answer questions.

My current understanding is that ImageClassifierData must have transforms applied. The code on GitHub shows a rich selection of options (squish, tilt, warp, zoom, etc.), but the basic transformations are a slight rotation and change in lighting. Regardless, I assume that you can specify whatever mix of transforms you want for training a network. (Am I correct so far?)

Questions. Once you have specified a list of transformation options, and run a single training epoch, what happens? Is each image only presented to the network once, or are multiple transformations presented during that epoch? If so, what transformations are applied? Are all transformations always applied, or a random subset, or only one at a time? Does the original image ever get presented to the network in its original form? What happens on subsequent epochs? Are the images transformed in different ways than in the previous epoch? Would it make sense (and can you?) run data augmentation schemes prior to training the network and create dozens of transformed versions of the same image, thereby increasing your training set size.

Either these questions were not addressed in the lectures or I am not attentive enough. I hope that someone can reveal the ‘big picture’ on this and provide a accessible description of the flow of code execution.


My first topic posted to the forum (about categorical variable embedding) generated many responses and was very positive.

This is my second topic. Two days no responses. Any feedback on why my question is so horrible would be appreciated. That way I can improve in the future.

The question seems fine. It probably just hasn’t been seen by someone who thinks they have a clear answer.

The docs are pretty good in this area and link directly to code.
Although for v1, the general intent applies to v0. Each time an image is presented to the training network it is subject to transformation. You can control which transformations may be applied (e.g. brightness, rotation), the likelihood of being used (eg 100%, 4%), and parameters to apply (e.g. rotation between -12 and +6 degrees). And you can write or include your own transforms.

I did find and read that documentation earlier, but was unable to make sense of what happens during an epoch. If I understand you correctly, you are saying that if I have 1000 images in my training set, when I apply transforms I will still only have 1000 images in the epoch? If so, the documentation was confusing because I was always confronted with 8 images of kittens when get_transforms was called.

I can`t confirm for sure how is doing it internally, but the usual concept of data augmentation is to apply randomly a set of affine or non-affine transformations to the images before sending the images to the network.
Most of the time if you have N training images, N augmented images will be presented to the model during 1 epoch. The next epoch, different random transforms will be applied to the images. This allows more variability to prevent overfitting.
Setting optimal boundaries for data augmentation transforms is part of the hyper-parameter tuning for a specific problem.

I think it’s a great question.You may want to check out how pytorch handles this.After all fastai is built on augmentation