CutMix > Mixup (?)

If you look at the “meaningfulness” of individual images, I would totally agree with you. But, if you are looking at it from a very competitive standpoint, it starts to make sense even for humans.
In professional sports, they use quite fancy training techniques. One of them is strobe glasses which darken the view in regular, quick intervals. As far as I know, they are used to improve the reflexes and to better anticipate movements. Athletes repeat their training routines very very often and they are constantly looking for ways to make them more challenging to improve their abilities.

Just like MixUp, this looks really crazy when are just thinking in terms of basketball. It is quite different if you think of it in terms of getting better at a task by making it more difficult by adding extra hurdles.

Let’s assume there was a professional ImageNet classification league. After some time, it wouldn’t be that unreasonable to assume for them to come up with MixUp and the like to make their training sessions more challenging.

Thanks a lot for you question! I never thought of training a neural network from this perspective!

5 Likes

Nice, I love the sports analogy!

What’s exciting to me is how much space this opens up for new data augmentation techniques. Essentially we’ve gone from
“Let’s try to artificially generate more examples that look like our training data”
to
“Let’s poke our model in the eye and see if it helps”
…which suggests an entirely different set of strategies!

1 Like

its a great paper shared and work on these lines…
Just qq is it useful for regression categories of image also…
say you have got four category 0,1,2,3,4 which are ordinal in nature we are not building a classification model but Regression model ,input will be an image…

Absolutely no! The way CNNs learn representations is vastly different than humans. Here’s my understanding of why mixup works:

Image space is really really massive, more than 1000 dimensional.
We only map a finite number of points in this space by telling our CNN how its output vector should behave there. So there are vast regions of the space we have not mapped where our NN function behaves totally randomly. Think about the case where i only train a very simple model on x>5 and x<-5. The model can behave in any manner depending on initilization in -5<x<5. If our validation is from this range we are in trouble!
Thus mixup smoothens the representation landscape by mapping more points in the intermediate space of the original data.

Let me know if you have any questions

5 Likes

(months later)…the dynamic LR is a really interesting idea. Do you know if anyone has implemented it in fastai?

1 Like