Mixup data augmentation

nestorDemeure · February 8, 2020, 10:36pm

@LessW2020 It is strange that it was triggered by setting alpha but the bug should be fixed now.

I also did some minor modifications which might impact the benchmarks: on the tiny-mnist example (which is admittedly a toy benchmark) I now see the same trend as before (output mixup is better than manifold mixup which is better than input mixup).

LessW2020 · February 9, 2020, 6:40pm

@nestorDemeure - trying to test alpha with latest drop but looks like you changed the names (i.e. OutputMixUp -> output_mixup) and that’s crashing:

~/anaconda3/lib/python3.7/site-packages/fastcore/foundation.py in call(self, *args, **kwargs)
206 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
207 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
–> 208 return self.fn(*fargs, **kwargs)
209
210 # Cell

~/fastai2/fastai2/learner.py in add_cb(self, cb)
201 def remove_cbs(self, cbs): L(cbs).map(self.remove_cb)
202 def add_cb(self, cb):
–> 203 old = getattr(self, cb.name, None)
204 assert not old or isinstance(old, type(cb)), f"self.{cb.name} already registered"
205 cb.learn = self

AttributeError: ‘function’ object has no attribute ‘name’

I have to run to church but please take a look and let me know.
I quickly looked at the code (for reference I am passing it in as a cb in the cbs learn function rather than appending at the end of it) so maybe I’ll change to a partial and manually append to the learner as a workaround.
I did test appending to the end but that has a seperate error stating it has no such method.
Anyway, if you can take a look and let me know preferred calling example I’ll retest with alpha=1 soon.
Thanks!

nestorDemeure · February 9, 2020, 7:03pm

OutputMixUp is the name of the callback (that should be passed with the cbs argument) while output_mixup is the name of the method (that can be used as it was in V1).

You can use anyone, both exist and should be working (cf demo.py which has both options for all callbacks). If you cannot get it working with OutputMixUp, could you send me a minimal failure case in a github issue ?

(I might have solved the this method does not exist problem but I am unsure as it does not happens in my tests)

LessW2020 · February 10, 2020, 5:14am

so the issue was I jumped to your github but was in the v1 version…hence the issue
if you can perhaps name the file and/or your github with the version number that would help avoid confusion. The v2 version is nearly identical but you have to hyperlink from the v1 readme (at least I did) to find it.
Since there was an update 8 hours ago in the v1, I assumed that was correct and cut and pasted the new code in.
Anyway sorry for the false alarm but I think it would be super helpful to put versioniniong on the githut projject and/or file name itself (i.e. manifold_mixup2.py or similar
Anyway testing now!

LessW2020 · February 10, 2020, 5:17am

spoke too soon - new error here:

~/anaconda3/lib/python3.7/site-packages/torch/distributions/distribution.py in sample(self, sample_shape)
117 “”"
118 with torch.no_grad():
–> 119 return self.rsample(sample_shape)
120
121 def rsample(self, sample_shape=torch.Size()):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/beta.py in rsample(self, sample_shape)
56
57 def rsample(self, sample_shape=()):
—> 58 return self._dirichlet.rsample(sample_shape).select(-1, 0)
59
60 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in rsample(self, sample_shape)
63 shape = self._extended_shape(sample_shape)
64 concentration = self.concentration.expand(shape)
—> 65 return _Dirichlet.apply(concentration)
66
67 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in forward(ctx, concentration)
16 @staticmethod
17 def forward(ctx, concentration):
—> 18 x = torch._sample_dirichlet(concentration)
19 ctx.save_for_backward(x, concentration)
20 return x

RuntimeError: “dirichlet” not implemented for ‘Long’

nestorDemeure · February 10, 2020, 7:51am

Damn! Let’s debug it over messages to avoid cluttering this thread.

Andreas_Daiminger · February 10, 2020, 11:43am

Thanks for this. Would be awesome if you could share the code that modifies the mixup callback to work with text embeddings

nestorDemeure · February 10, 2020, 12:48pm

Note that it should be quite easy to do with manifold_mixup by wrapping the embeddings with a ManifoldMixupModule (but raw manifold mixup or output mixup (if you do classification) might give better results).

nestorDemeure · February 11, 2020, 8:40pm

Ok, @LessW2020 is still benchmarking the new implementation to measure the accuracy boost but all the bugs and corner cases spotted have been corrected for both the V1 and V2 versions.

I believe the result is worth exploring for anybody wanting to use a form of data augmentation for any kind of input (and not just pictures):

morgan · February 11, 2020, 9:10pm

umazzzing, thats for the work on this you two!

hakymulla · February 12, 2020, 11:26am

is it advisable to use both mixup augmentation and the data augmentation like(flip, rotate etc). The experiment i did didn’t yield a good result.

digitalspecialists · February 12, 2020, 12:37pm

It’s advisable to try it. Sometimes it works, other times not. It depends on your dataset and your task. You can also try changing the alpha parameter in mixup. I think its fair to say that with mixup you might want to be less agressive in augmentation, eg less affine transformations, narrower lighting thresholds.

sayakgis · June 8, 2020, 5:15am

@nestorDemeure thanks for the great work! will this work for segmentation models via unet_learner?

nestorDemeure · June 8, 2020, 7:19am

I have not tested it properly on that usecase but it should work out of the box

Maney · August 17, 2020, 11:21am

I Understand that Input Mixup works awesome when input images are of different classes like identifying b/w Buttefly and Airplane (or) Apple and Human.

But does it work well in dataset that consists of minor differences like classification of dog breeds or identifying skin cancer in images? or am I on a different lane?

Thanks in advance!

morgan · August 17, 2020, 12:18pm

It does yep! I haven’t seen anyone quantify whether its more effective when the classes are very different

Maney · August 18, 2020, 6:09am

Thanks @morgan! I will give a try.

nestorDemeure · October 4, 2020, 2:18pm

Quick post to let you know that I have just updated the Manifold Mixup repo (V2) to the current version of fastai V2 and that everything should work as expected out of the box

shensmobile · May 6, 2021, 10:38pm

Hi everyone, just made the pivot to fastai2 today and noticed that mixup is only available in the vision library. Has anyone managed to make mixup work for text applications in fastai2?

sf_chrysopylae · June 16, 2021, 4:28pm

Concerning mixup, does anyone know of workarounds for the open bug described here on GitHub? I’m encountering it when trying to run Learner.fit_one_cycle and Learner.lr_find – gist is, I initialize mixup = Mixup(1.) and then pass mixup to the callbacks kwarg in the Learner call, and get RuntimeError: expected dtype long int for `weights` but got dtype float. The GitHub issue reports that the error does not occur in v2.2.5; thus far, I’ve found that it occurs in v2.4.0, v2.3.1, and v2.2.7 of the library…