Does it make sense to apply MixUp augmentation in TTA?

Intuitively, doing so doesn’t make sense to me. If we interpolate a test instance randomly with another random test instance, it might just ends up confusing the model and make prediction worse. But I am not completely sure.

Any intuition / explanation / pointer for learning resource on this ?

@sirgarfield I could be wrong, but the way I think about it is:
Mixup is used in order to improve training by exposing the model to additional data than it would have otherwise had access to – we’re increasing the number of times we can do our backward pass to modify the weights without as much of risk for overfitting.

Like you mentioned, we are looking to understand how well our model can parse the data and by adding mixup on the validation or test – we would be hindering our models ability to interpret the sample. Additionally, at test-time we are evaluating our model and since we are not trying to improve our model at this point (via modifying the weights) - we have no need for the additional samples.

@ali_baba explains it very well. Here’s my two cents:

Presume your categories are cats, dogs, airplanes, and pianos. MixUp, well, mixes these, so one picture would be 30% cat, 70% dog, another would be 10% airplane, 90% dog, and so on. Thus, for the former, your model would ideally predict 30% cat and 70% dog, rather than 100% cat or 100% dog.

Fast-forward to test time: You have a picture of a dog, but are not sure whether it’s actually a cat, dog, airplane, or piano (common scenario, I’m told). With no MixUp, your model should predict 100% dog and 0% cat, airplane, and piano.

But if you mix 90% of your picture with 10% of a picture of what you know to be a, say, piano, your model should predict 90% dog, 10% piano, and 0% cat and airplane. Given that you know the ratio for each picture, I suppose you could infer that your original picture is a dog, but generally speaking, your model is usually not that accurate (we’ve thus far assumed your model is more or less perfect), and it is much easier for it to classify a picture as just a dog or just a piano than some parts dog some parts piano.

Hope that helps!