I’ve read that when increasing the size of your dataset via augmentations you should limit those to things that are realistic and could actually occur in a test set. However, someone I know often gets significantly improved performance in object detection problems by using drastic augmentations, warpings, and color effects that are very unrealistic.
I’m trying to understand why this might be and if there are any theoretical justifications that support doing these sorts of augmentations. If anyone has experience in this topic or papers to share please weigh in!
maybe it could be explained by following:
if you learn only from ‘realistic’ data then it is more probable you are fitting to your train/validation set.
If you are using more ‘unrealistic’ augmentation then the model can learn more intrinsic features/structure of the data and not memorize that much of the training data.
That makes sense I think intuitively, but it seems to be along the same lines as the argument for augmentations in general. I don’t know why then that people often claim you should not perform unrealistic augmentations. Do you think that maybe very small datasets may warrant more drastic augmentation schemes?
but the bigger the dataset the more difficult it is to find ‘unrealistic’ augmentation, isn’t it? Theoretically in quantum world you’d have all possible augmentations included in your dataset
I guess at the limit all augmentations would be included, but for example say you’re doing an object detection problem picking out cars. You will probably never get a video with completely inverted colors, or only a single rgb color channel, yet those augmentations sometimes seem to help practically speaking.
I don’t have that much experience yet in this matter. I see your point. For me the explanation could be if model could somehow generalize those ‘weird’ examples. But I guess there is always a fine line between teaching a model and confusing a model.
There’s definitely a fine line that I’d like to understand more about. For example flipping text backwards would definitely be crossing that line and degrading performance. Maybe it’s something that we just need to build intuition with for now.
In the mixup paper at ICLR this year, the authors show that a variety of tasks can be augmented by making linear combinations of inputs and their associated labels – e.g. 50% cat 50% dog should be labelled as such.
This is a simple augmentation technique that works – and images do not look like training-set examples, as they look like overlays. This technique also works for semantic segmentation, shown here.
This suggests strongly that we shouldn’t necessarily restrict ourselves to ‘realistic’ augmentations.