Quick question to make sure I understand what is going on here:
If as a premise I generated an infinite set of image batches one at a time (say 32-64 RGB images) each with labels to perform a regression on floats, after I tuned the network parameters for basic training, would it be fine to just feed a never ending stream of 100% training set data into the model?
As far as I can see the validation set is only there for human comprehension to ensure that the data isn’t over fitting in the model we are training and to produce unbiased metrics.
There would be no epocs in this model as each training batch would be generated live one at a time, each would be pseudo unique and therefore it should be almost impossible to over-fit my data (if enough consideration is put into the generation side).
This data would have 0 transforms applied apart from normalization on a per batch basis. as the random crop, skews, angles, flip etc are all taken into account when generating, and although generation is not prohibitively costly, extra data being generated that is not contributing to training the model would definitely add unwanted overheads.
My gut instinct says this should be fine, however maybe once every 20-30 batches is it worth generating a validation set to be sure that everything is running smoothly?
Side question for bonus points
If I generate an image of a face and for example wanted to regress a float to represent the faces width, could I (after training is completed) input 5-10 faces and simply mean the output giving me the best chance of getting the number right? Again intuition tells me this is fine.