When and how to use mixup

After reading Jeremy Howard post about mixup, I’ve been interested in using it for some of my problems. So far I’ve only seen it been used on image classification, but I was wondering it can be used for tabular data (classification or regression), or NLP data. Can mixup be used for these type of DL problems, and how can I use it effectively?

You can try (though you may need to rewrite parts of the callbacks or the models because you have to mixup after the embeddings) but it’s untested.
On tabular, first experiments didn’t seem like it was helping. On text I had a little bump thanks to mixup in classification.

@sgugger @maxmatical Since the topic is “When and how to use mixup”, I thought I’d ask my question here. I’m fairly new to fastAI and also the idea behind mixup data augmentation. I’m dealing with a large data set of images, where I’m pretty sure, my current CNN approach is overfitting. So I’m really eager to try out the mixup implementation. My question would be, is it necessary, to add mixup right from the beginning of the training? Or would it be possible to do some training without mixup and when the model seems to be overfitting, change the learner accordingly? If so, how would this be implemented?
(I apologize in advance if the question is stupid - it’s actually my first post, so there might be room to improve).

Best wishes.


It’s not stupid at all! There is no direct way to do this, but you can change the MixupCallback to not do its thing when you’re not ready, either by testing if training has reached a given iteration, or the losses meet a certain test.

1 Like

@sgugger Thank you so much for your fast reply. I just tried an implementation very similar to your suggestion and it seems to work surprisingly well. Of course, I will have to verify this, but really appreciated your help.

Hey, so did you use mixup right from the start or did you use just when you thought your model is starting to overfit? Can you share code samples and results?

@dipam7 We actually tried both, but the results so far are not yet conclusive. I’ll give you an update as soon as we are sure about our observations. Cool?

1 Like

@sfoersch We are still waiting on that reply :slight_smile:


Actually, even the authors of the initial paper used mixup not only on image data (section 3.6, https://arxiv.org/pdf/1710.09412.pdf). I tried to use it on tabular data in one on Kaggle’s competition and got significant score impovement. Described my experiece with code in my blog https://perfstories.wordpress.com/2020/12/04/mixup-on-tabular-data/.
Hope that’ll be useful!


This is seriously cool! Great work!

1 Like

Thank you so much!