Can't make MixUp work

At the end of Chapter 14 it is suggested to try training the ResNet with MixUp.

l = Learner(dls, rn, loss_func=nn.CrossEntropyLoss(), metrics=accuracy, cbs=MixUp()).to_fp16()
l.fit_one_cycle(100, 3e-3)

Afaik using the library all that is needed is to add the cbs=MixUp() part. However, this error occurs:

RuntimeError: Exception occured in MixUp when calling event before_batch:
expected dtype long int for weight but got dtype float

I found these 2 issues, but couldn’t solve my problem:
mixup crashing lr_find · Issue #2233 · fastai/fastai · GitHub
Mixup Errors · Issue #3183 · fastai/fastai · GitHub

1 Like

The issue you have linked (Mixup Errors · Issue #3183 · fastai/fastai · GitHub) actually does give out the solution, albeit not clearly. In the issue, the author says:

# The line that causes the error:
self.learn.yb = tuple(L(self.yb1,self.yb).map_zip(torch.lerp,weight=unsqueeze(self.lam, n=ny_dims-1)))

And that they solved it by setting y_int = True

Inspecting the code for MixUp, we only fall into the error causing code when self.stack_y = False. The value for self.stack_y is set by MixHandler (parent class of MixUp) here:

def before_train(self):
    "Determine whether to stack y"
    self.stack_y = getattr(self.learn.loss_func, 'y_int', False)

i.e., if your loss_func doesn’t have the y_int attribute or if it’s set to false.

y_int is something fastai sets for their loss functions, that’s why the author of the issue had to set it manually for their custom loss function. It is also not set by PyTorch in their losses that’s why MixUp isn’t working right for you. So basically, all you have to do is change your loss function from nn.CrossEntropyLoss to CrossEntropyLossFlat.

1 Like