I do not understand the the big picture of image data augmentation, transforms. I dug into the code on GitHub, but I have not been successful tracing the code to discover what is going on. First, I will document my current understanding and then ask questions. Please correct my understanding if wrong, and then answer questions.
My current understanding is that ImageClassifierData must have transforms applied. The code on GitHub shows a rich selection of options (squish, tilt, warp, zoom, etc.), but the basic transformations are a slight rotation and change in lighting. Regardless, I assume that you can specify whatever mix of transforms you want for training a network. (Am I correct so far?)
Questions. Once you have specified a list of transformation options, and run a single training epoch, what happens? Is each image only presented to the network once, or are multiple transformations presented during that epoch? If so, what transformations are applied? Are all transformations always applied, or a random subset, or only one at a time? Does the original image ever get presented to the network in its original form? What happens on subsequent epochs? Are the images transformed in different ways than in the previous epoch? Would it make sense (and can you?) run data augmentation schemes prior to training the network and create dozens of transformed versions of the same image, thereby increasing your training set size.
Either these questions were not addressed in the lectures or I am not attentive enough. I hope that someone can reveal the ‘big picture’ on this and provide a accessible description of the flow of code execution.