Mixup data augmentation

@LessW2020 It is strange that it was triggered by setting alpha but the bug should be fixed now.

I also did some minor modifications which might impact the benchmarks: on the tiny-mnist example (which is admittedly a toy benchmark) I now see the same trend as before (output mixup is better than manifold mixup which is better than input mixup).

1 Like

@nestorDemeure - trying to test alpha with latest drop but looks like you changed the names (i.e. OutputMixUp -> output_mixup) and that’s crashing:

~/anaconda3/lib/python3.7/site-packages/fastcore/foundation.py in call(self, *args, **kwargs)
206 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
207 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
–> 208 return self.fn(*fargs, **kwargs)
209
210 # Cell

~/fastai2/fastai2/learner.py in add_cb(self, cb)
201 def remove_cbs(self, cbs): L(cbs).map(self.remove_cb)
202 def add_cb(self, cb):
–> 203 old = getattr(self, cb.name, None)
204 assert not old or isinstance(old, type(cb)), f"self.{cb.name} already registered"
205 cb.learn = self

AttributeError: ‘function’ object has no attribute ‘name’

I have to run to church but please take a look and let me know.
I quickly looked at the code (for reference I am passing it in as a cb in the cbs learn function rather than appending at the end of it) so maybe I’ll change to a partial and manually append to the learner as a workaround.
I did test appending to the end but that has a seperate error stating it has no such method.
Anyway, if you can take a look and let me know preferred calling example I’ll retest with alpha=1 soon.
Thanks!

OutputMixUp is the name of the callback (that should be passed with the cbs argument) while output_mixup is the name of the method (that can be used as it was in V1).

You can use anyone, both exist and should be working (cf demo.py which has both options for all callbacks). If you cannot get it working with OutputMixUp, could you send me a minimal failure case in a github issue ?

(I might have solved the this method does not exist problem but I am unsure as it does not happens in my tests)

1 Like

so the issue was I jumped to your github but was in the v1 version…hence the issue :slight_smile:
if you can perhaps name the file and/or your github with the version number that would help avoid confusion. The v2 version is nearly identical but you have to hyperlink from the v1 readme (at least I did) to find it.
Since there was an update 8 hours ago in the v1, I assumed that was correct and cut and pasted the new code in.
Anyway sorry for the false alarm but I think it would be super helpful to put versioniniong on the githut projject and/or file name itself (i.e. manifold_mixup2.py or similar :slight_smile:
Anyway testing now!

1 Like

spoke too soon - new error here:

~/anaconda3/lib/python3.7/site-packages/torch/distributions/distribution.py in sample(self, sample_shape)
117 “”"
118 with torch.no_grad():
–> 119 return self.rsample(sample_shape)
120
121 def rsample(self, sample_shape=torch.Size()):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/beta.py in rsample(self, sample_shape)
56
57 def rsample(self, sample_shape=()):
—> 58 return self._dirichlet.rsample(sample_shape).select(-1, 0)
59
60 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in rsample(self, sample_shape)
63 shape = self._extended_shape(sample_shape)
64 concentration = self.concentration.expand(shape)
—> 65 return _Dirichlet.apply(concentration)
66
67 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in forward(ctx, concentration)
16 @staticmethod
17 def forward(ctx, concentration):
—> 18 x = torch._sample_dirichlet(concentration)
19 ctx.save_for_backward(x, concentration)
20 return x

RuntimeError: “dirichlet” not implemented for ‘Long’

Damn! Let’s debug it over messages to avoid cluttering this thread.

Thanks for this. Would be awesome if you could share the code that modifies the mixup callback to work with text embeddings

Note that it should be quite easy to do with manifold_mixup by wrapping the embeddings with a ManifoldMixupModule (but raw manifold mixup or output mixup (if you do classification) might give better results).

Ok, @LessW2020 is still benchmarking the new implementation to measure the accuracy boost but all the bugs and corner cases spotted have been corrected for both the V1 and V2 versions.

I believe the result is worth exploring for anybody wanting to use a form of data augmentation for any kind of input (and not just pictures):

3 Likes

umazzzing, thats for the work on this you two!

2 Likes

is it advisable to use both mixup augmentation and the data augmentation like(flip, rotate etc). The experiment i did didn’t yield a good result.

It’s advisable to try it. Sometimes it works, other times not. It depends on your dataset and your task. You can also try changing the alpha parameter in mixup. I think its fair to say that with mixup you might want to be less agressive in augmentation, eg less affine transformations, narrower lighting thresholds.

1 Like

@nestorDemeure thanks for the great work! will this work for segmentation models via unet_learner?

I have not tested it properly on that usecase but it should work out of the box :slight_smile:

1 Like

I Understand that Input Mixup works awesome when input images are of different classes like identifying b/w Buttefly and Airplane (or) Apple and Human.

But does it work well in dataset that consists of minor differences like classification of dog breeds or identifying skin cancer in images? or am I on a different lane?

Thanks in advance!

1 Like

It does yep! I haven’t seen anyone quantify whether its more effective when the classes are very different

Thanks @morgan! I will give a try.

Quick post to let you know that I have just updated the Manifold Mixup repo (V2) to the current version of fastai V2 and that everything should work as expected out of the box :slight_smile:

1 Like

Hi everyone, just made the pivot to fastai2 today and noticed that mixup is only available in the vision library. Has anyone managed to make mixup work for text applications in fastai2?

Concerning mixup, does anyone know of workarounds for the open bug described here on GitHub? I’m encountering it when trying to run Learner.fit_one_cycle and Learner.lr_find – gist is, I initialize mixup = Mixup(1.) and then pass mixup to the callbacks kwarg in the Learner call, and get RuntimeError: expected dtype long int for `weights` but got dtype float. The GitHub issue reports that the error does not occur in v2.2.5; thus far, I’ve found that it occurs in v2.4.0, v2.3.1, and v2.2.7 of the library…