Fastbook - clarification on MixUp (chap 7)

FraPochetti · June 15, 2020, 11:16am

Hi everybody,
I am going through chapter 7 of fastbook, and I don’t quite get the following:

One issue with this, however, is that Mixup is “accidentally” making the labels bigger than 0, or smaller than 1. That is to say, we’re not explicitly telling our model that we want to change the labels in this way. So, if we want to change to make the labels closer to, or further away from 0 and 1, we have to change the amount of Mixup—which also changes the amount of data augmentation, which might not be what we want.

What is this supposed to mean exactly?
I thought I had gotten MixUp until I stumbled upon the closure of the paragraph .
Thanks a lot!

muellerzr · June 15, 2020, 11:18am

This means that instead of having our mixup’d labels be between 0 and 1 (Binary classification), it could be something like 1.5 (taking all the probabilities and whatnot). This is an issue because our models have an intrinsic y range we use, so it can lead to troubles in our loss function. Does this help some?

FraPochetti · June 15, 2020, 11:23am

Thanks for the fast reply!

Really? How is this possible?
We are multiplying 1s and 0s by a random factor (between 0 and 1) sampled from a gamma distribution, to obtain the below.

FraPochetti · June 15, 2020, 11:47am

It seems to me that what we do is far from being accidental or not explicit.
We are actually mixing everything up on purpose, making sure to apply the same transformation to both inputs and outputs. The model is perfectly aware of it, as the mixed-up outputs is what it sees during training.
No?

muellerzr · June 15, 2020, 11:52am

Hmm… good point. (Sorry I blame not having my coffee ) because even with a mixup of 0.99, it’d still be 0.99, 0.1. (Just reading briefly now) it looks like label smoothing helps (given it brings everything down by a bit), so general rule of thumb would make sense to be MixUp + LabelSmoothing (until we figure out the why that’s needed), but the theory behind the greater than I am a bit lost as you (now that I am thinking about it)

shackenberg · December 14, 2020, 3:20pm

I just read this paragraphs and was also confused. My understanding of the “problem” is, that MixUp will make the predictions of the model smaller or less “confident”. This happens because w/o MixUp the average target label value for each class would be 1.0 and with MixUp it would be 0.5.