Dear all,
I need to verify whether my understanding is correct from those who have been in this field for much longer.
What I have observed is that the training results and validation accuracy of the model is 95% when the training size is manually augmented vs it being 50-52% when data augmentation is used with same augmentation schemes used in both scenarios.
TLDR;
Lets consider the following:
(1) Training set size: X (associated labels y are available). (X ~ 1000)
(2) two data augmentation functions are defined, say A1 and A2.
Scenario 1:
Use A1 and A2 to create 5 variants of each training sample resulting in total training set size :(5+1)X = 6X
Validation accuracy: 96% after 150 epochs
Scenario 2:
Deploy A1 and A2 to operate on the fly during training with training set size X.
Validation accuracy: 52% after 150 epochs
Question: Is this always to be expected? Functionally, the two scenarios seem similar (infact Scenario 2 will present many more variants to the model as number of epochs is high) - Why does the training not take off in scenario 2?
Thanks and regards,
~anoop
Anoop Kulkarni, PhD
Innotomy Consulting