Mixup data augmentation

btw - if they weren’t doing the last output in the paper, I think you should do your own paper on it? :slight_smile:

It is a logical incremental improvement so I am quite sure that there is already a paper somewhere implementing it (plus I found a blogpost proposing the same idea).

The authors of manifold mixup explained that injecting mixup within a resblock can cause problems and make the method fail. Thus they instrumented only the input of the network and the output of resblocks.

This might explain the disappointing results we get. I will update the code in about 10 hours(done!) to:

  • use a module list if the user provide one
  • otherwise use only ManifoldMixupModule if there are some in the network
  • otherwise use only Resblocks if there are some in the network
  • otherwise use all non recurrent layers

The sad thing is that it make the automatic application of manifold mixup to network that contain no ResBlocks potentially complicated (here output_mixup is clearly advantageous).

1 Like

The code has been updated. @LessW2020, have you the time to rerun a bench on manifold_mixup to see if things improved ? (no need to retest output_mixup as it was not modified)

1 Like

Thanks @nestorDemeure! Yes I can benchmark again. My only request is please instrument for EfficientNet automatically :slight_smile:
For my work EffNet’s are performing far better than ResNets so that’s all I’m working in now on a daily basis.
Anyway let me redo the benchmarking with XResNets for this and will update!

1 Like

I checked, everything should work out of the box for efficientNet :slight_smile:

(by the way, what practical advantages are you seeing compared to a resNet ?)

1 Like

First results!
1 - The ‘improved internal’ consistently outperforms the original version.

2 - In general it now outperforms ‘regular or input’ mixup (vs before it did not).

3 - The output mixup still performs best so far (Woof, Nette, private med database).

4 - Works great on EfficientNet :slight_smile:

Big thank you for your work @nestorDemeure!

4 Likes

also I really like the printout you added regarding blocks detected at the start of training…
that kind of preview info is really helpful to make sure things are working up front instead of finding out the hard way later.

1 Like

Great for the benchmark! I believe we have now a proper reproduction of what they did in the paper :partying_face: (I even added some refinements, found in the literature, for U-Net)

Do you think I should focus on output_mixup for V2 or keep both versions ?

2 Likes

I would definitely focus on output_mixup for v2 first and get that working as that’s the clear winner on all fronts.
If you then have time, I do think keeping internal mixup is worthwhile as it’s now outperforming standard ‘input mixup’. And input mixup is a mainstay so beating that means it clearly has merit and may perform better than output for some cases (segmentation?)

Thanks again @nestorDemeure - great work!

@LessW2020, another thing you might want to try is an alpha of 1 (instead of 0.4, the default value) as it is the value used for the paper (while 0.4 is the value used in fastai’s implementation of input mixup).

1 Like

There is now a new repository V2 port : manifold mixup V2

I did no proper benchmark I as have no V2 code so, @LessW2020, I am counting on you to confirm that things work properly :slight_smile:

(by the way, any help to get the demo notebook closer to the V1 one is welcome, the V2 equivalent to simple_cnn seem to not like single channel, black and white, pictures)

1 Like

Awesome, thanks @nestorDemeure!
I will do some testing now and then more testing later today/tomorrow and update here!

I’ve tested Woof 128 and now about to test Nette 128 - however I thought I’d also test with alpha=1 but it’s blowing up when I set the alpha with:

~/fastai2/fastai2/learner.py in call(self, event_name)
23 _run = (event_name not in _inner_loop or (self.run_train and getattr(self, ‘training’, True)) or
24 (self.run_valid and not getattr(self, ‘training’, False)))
—> 25 if self.run and _run: getattr(self, event_name, noop)()
26
27 @property

~/fastai2/nbs/manifold_mixup.py in after_batch(self)
164 def after_batch(self):
165 “Removes hook if needed”
–> 166 if self._is_input_mixup: return
167 self._mixup_hook.remove()
168

~/anaconda3/lib/python3.7/site-packages/fastcore/foundation.py in getattr(self, k)
221 attr = getattr(self,self._default,None)
222 if attr is not None: return getattr(attr, k)
–> 223 raise AttributeError(k)
224 def dir(self): return custom_dir(self, self._dir() if self._xtra is None else self._dir())
225 # def getstate(self): return self.dict

AttributeError: _is_input_mixup

Here’s the results with v2:
a little more mixed though in all cases mixup better than just regular augmentation (default was flip and randomresizecrop).

On woof all 3 mixups were almost tied.

I’ll try on my private dataset tomorrow.

Thanks for the great work @nestorDemeure! (and let me know if the alpha issue is fixed as I can try that as a variable).

1 Like

@LessW2020 It is strange that it was triggered by setting alpha but the bug should be fixed now.

I also did some minor modifications which might impact the benchmarks: on the tiny-mnist example (which is admittedly a toy benchmark) I now see the same trend as before (output mixup is better than manifold mixup which is better than input mixup).

1 Like

@nestorDemeure - trying to test alpha with latest drop but looks like you changed the names (i.e. OutputMixUp -> output_mixup) and that’s crashing:

~/anaconda3/lib/python3.7/site-packages/fastcore/foundation.py in call(self, *args, **kwargs)
206 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
207 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
–> 208 return self.fn(*fargs, **kwargs)
209
210 # Cell

~/fastai2/fastai2/learner.py in add_cb(self, cb)
201 def remove_cbs(self, cbs): L(cbs).map(self.remove_cb)
202 def add_cb(self, cb):
–> 203 old = getattr(self, cb.name, None)
204 assert not old or isinstance(old, type(cb)), f"self.{cb.name} already registered"
205 cb.learn = self

AttributeError: ‘function’ object has no attribute ‘name’

I have to run to church but please take a look and let me know.
I quickly looked at the code (for reference I am passing it in as a cb in the cbs learn function rather than appending at the end of it) so maybe I’ll change to a partial and manually append to the learner as a workaround.
I did test appending to the end but that has a seperate error stating it has no such method.
Anyway, if you can take a look and let me know preferred calling example I’ll retest with alpha=1 soon.
Thanks!

OutputMixUp is the name of the callback (that should be passed with the cbs argument) while output_mixup is the name of the method (that can be used as it was in V1).

You can use anyone, both exist and should be working (cf demo.py which has both options for all callbacks). If you cannot get it working with OutputMixUp, could you send me a minimal failure case in a github issue ?

(I might have solved the this method does not exist problem but I am unsure as it does not happens in my tests)

1 Like

so the issue was I jumped to your github but was in the v1 version…hence the issue :slight_smile:
if you can perhaps name the file and/or your github with the version number that would help avoid confusion. The v2 version is nearly identical but you have to hyperlink from the v1 readme (at least I did) to find it.
Since there was an update 8 hours ago in the v1, I assumed that was correct and cut and pasted the new code in.
Anyway sorry for the false alarm but I think it would be super helpful to put versioniniong on the githut projject and/or file name itself (i.e. manifold_mixup2.py or similar :slight_smile:
Anyway testing now!

1 Like

spoke too soon - new error here:

~/anaconda3/lib/python3.7/site-packages/torch/distributions/distribution.py in sample(self, sample_shape)
117 “”"
118 with torch.no_grad():
–> 119 return self.rsample(sample_shape)
120
121 def rsample(self, sample_shape=torch.Size()):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/beta.py in rsample(self, sample_shape)
56
57 def rsample(self, sample_shape=()):
—> 58 return self._dirichlet.rsample(sample_shape).select(-1, 0)
59
60 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in rsample(self, sample_shape)
63 shape = self._extended_shape(sample_shape)
64 concentration = self.concentration.expand(shape)
—> 65 return _Dirichlet.apply(concentration)
66
67 def log_prob(self, value):

~/anaconda3/lib/python3.7/site-packages/torch/distributions/dirichlet.py in forward(ctx, concentration)
16 @staticmethod
17 def forward(ctx, concentration):
—> 18 x = torch._sample_dirichlet(concentration)
19 ctx.save_for_backward(x, concentration)
20 return x

RuntimeError: “dirichlet” not implemented for ‘Long’

Damn! Let’s debug it over messages to avoid cluttering this thread.