Theory Question about Data Augmentation


I have a theoretical question about data augmentation.

I have learned Machine Learning before, and this is the first time I’ve seen data augmentation – its cool!

However, should I worry about its possible violation of iid assumptions? My understanding, is that when doing gradient descent, it assumes that all data points are independent (and identically distributed) to each other. When we augment the data, all the “new” data points are not independent. Is this a problem?