Fixing MixUp and CutMix error

meinkappa · October 17, 2021, 4:24pm

Hello, all. I am new to programming, and I found an issue with fastai that I might be able to help.

There is an issue #3486 on fastai regarding MixUp and CutMix:

CutMix or MixUp doesnt seem to work with training after using lr_finder

opened 03:42AM - 28 Sep 21 UTC

Be sure you've searched [the forums](https://forums.fast.ai) for the error messa…ge you received. Also, unless you're an experienced fastai developer, first ask on the forums to see if someone else has seen a similar issue already and knows how to solve it. **Only file a bug report here when you're quite confident it's not an issue with your local setup.** It is done **Please see [this model example](https://github.com/fastai/fastai2/issues/487) of how to fill out an issue correctly. Please try to emulate that example as appropriate when opening an issue.** Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES **Describe the bug** I used a simple configuration for training, pass the MixUp / CutMix callback on learner creation. And perform lr finder. It works well until that point. Afterwards when i call fine_tune, it complains with the following error `RuntimeError: expected dtype long int for `weights` but got dtype float` **To Reproduce** ```py train_dl = ... val_dl = ... cbs = [ cutmix, ] learn = learner.cnn_learner(dls, models.resnet18, pretrained=True, opt_func=optimizer.Adam, loss_func=losses.CrossEntropyLossFlat(), cbs=cbs) lrs = learn.lr_find() # the following throws the error learn.fine_tune(30, base_lr=lrs.valley) ``` **Expected behavior** Training to resume properly **Error with full stack trace** Place between these lines with triple backticks: ``` RuntimeError Traceback (most recent call last) <ipython-input-16-5b6d988e9126> in <module>() ----> 1 learn.fine_tune(30, base_lr=lrs.valley) 26 frames /usr/local/lib/python3.7/dist-packages/fastai/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs) 156 "Fine tune with `Learner.freeze` for `freeze_epochs`, then with `Learner.unfreeze` for `epochs`, using discriminative LR." 157 self.freeze() --> 158 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs) 159 base_lr /= 2 160 self.unfreeze() /usr/local/lib/python3.7/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt) 111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), 112 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))} --> 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd) 114 115 # Cell /usr/local/lib/python3.7/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt) 219 self.opt.set_hypers(lr=self.lr if lr is None else lr) 220 self.n_epoch = n_epoch --> 221 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup) 222 223 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_fit(self) 210 for epoch in range(self.n_epoch): 211 self.epoch=epoch --> 212 self._with_events(self._do_epoch, 'epoch', CancelEpochException) 213 214 def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch(self) 204 205 def _do_epoch(self): --> 206 self._do_epoch_train() 207 self._do_epoch_validate() 208 /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch_train(self) 196 def _do_epoch_train(self): 197 self.dl = self.dls.train --> 198 self._with_events(self.all_batches, 'train', CancelTrainException) 199 200 def _do_epoch_validate(self, ds_idx=1, dl=None): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in all_batches(self) 167 def all_batches(self): 168 self.n_iter = len(self.dl) --> 169 for o in enumerate(self.dl): self.one_batch(*o) 170 171 def _do_one_batch(self): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in one_batch(self, i, b) 192 b = self._set_device(b) 193 self._split(b) --> 194 self._with_events(self._do_one_batch, 'batch', CancelBatchException) 195 196 def _do_epoch_train(self): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in __call__(self, event_name) 139 140 def ordered_cbs(self, event): return [cb for cb in self.cbs.sorted('order') if hasattr(cb, event)] --> 141 def __call__(self, event_name): L(event_name).map(self._call_one) 142 143 def _call_one(self, event_name): /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs) 152 def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step)) 153 --> 154 def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs)) 155 def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs)) 156 def argfirst(self, f, negate=False): return first(i for i,o in self.enumerate() if f(o)) /usr/local/lib/python3.7/dist-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs) 664 res = map(g, iterable) 665 if gen: return res --> 666 return list(res) 667 668 # Cell /usr/local/lib/python3.7/dist-packages/fastcore/basics.py in __call__(self, *args, **kwargs) 649 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i) 650 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:] --> 651 return self.func(*fargs, **kwargs) 652 653 # Cell /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _call_one(self, event_name) 143 def _call_one(self, event_name): 144 if not hasattr(event, event_name): raise Exception(f'missing {event_name}') --> 145 for cb in self.cbs.sorted('order'): cb(event_name) 146 147 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state) /usr/local/lib/python3.7/dist-packages/fastai/callback/core.py in __call__(self, event_name) 43 (self.run_valid and not getattr(self, 'training', False))) 44 res = None ---> 45 if self.run and _run: res = getattr(self, event_name, noop)() 46 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit 47 return res /usr/local/lib/python3.7/dist-packages/fastai/callback/mixup.py in before_batch(self) 66 if not self.stack_y: 67 ny_dims = len(self.y.size()) ---> 68 self.learn.yb = tuple(L(self.yb1,self.yb).map_zip(torch.lerp,weight=unsqueeze(self.lam, n=ny_dims-1))) 69 70 def rand_bbox(self, W, H, lam): /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in map_zip(self, f, cycled, *args, **kwargs) 177 def zip(self, cycled=False): return self._new((zip_cycle if cycled else zip)(*self)) 178 def zipwith(self, *rest, cycled=False): return self._new([self, *rest]).zip(cycled=cycled) --> 179 def map_zip(self, f, *args, cycled=False, **kwargs): return self.zip(cycled=cycled).starmap(f, *args, **kwargs) 180 def map_zipwith(self, f, *rest, cycled=False, **kwargs): return self.zipwith(*rest, cycled=cycled).starmap(f, **kwargs) 181 def shuffle(self): /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in starmap(self, f, *args, **kwargs) 174 return self.map(lambda o: o.get(k,default) if isinstance(o, dict) else nested_attr(o,k,default)) 175 --> 176 def starmap(self, f, *args, **kwargs): return self._new(itertools.starmap(partial(f,*args,**kwargs), self)) 177 def zip(self, cycled=False): return self._new((zip_cycle if cycled else zip)(*self)) 178 def zipwith(self, *rest, cycled=False): return self._new([self, *rest]).zip(cycled=cycled) /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs) 108 @property 109 def _xtra(self): return None --> 110 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs) 111 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None) 112 def copy(self): return self._new(self.items.copy()) /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs) 95 def __call__(cls, x=None, *args, **kwargs): 96 if not args and not kwargs and x is not None and isinstance(x,cls): return x ---> 97 return super().__call__(x, *args, **kwargs) 98 99 # Cell /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest) 103 def __init__(self, items=None, *rest, use_list=False, match=None): 104 if (use_list is not None) or not is_array(items): --> 105 items = listify(items, *rest, use_list=use_list, match=match) 106 super().__init__(items) 107 /usr/local/lib/python3.7/dist-packages/fastcore/basics.py in listify(o, use_list, match, *rest) 54 elif isinstance(o, list): res = o 55 elif isinstance(o, str) or is_array(o): res = [o] ---> 56 elif is_iter(o): res = list(o) 57 else: res = [o] 58 if match is not None: /usr/local/lib/python3.7/dist-packages/fastai/torch_core.py in __torch_function__(self, func, types, args, kwargs) 338 convert=False 339 if _torch_handled(args, self._opt, func): convert,types = type(self),(torch.Tensor,) --> 340 res = super().__torch_function__(func, types, args=args, kwargs=kwargs) 341 if convert: res = convert(res) 342 if isinstance(res, TensorBase): res.set_meta(self, as_copy=True) /usr/local/lib/python3.7/dist-packages/torch/_tensor.py in __torch_function__(cls, func, types, args, kwargs) 1021 1022 with _C.DisableTorchFunction(): -> 1023 ret = func(*args, **kwargs) 1024 return _convert(ret, cls) 1025 RuntimeError: expected dtype long int for `weights` but got dtype float ``` **Additional context** The same error happens when i try to use cutmix and mixup together, so what im thinking is, label is augmented in place, so that it is already a float by the time trainig happens?

I think this error is happening because MixHandler callback runs before_train twice, which runs
self.old_lf,self.learn.loss_func = self.learn.loss_func,self.lf
twice.
It is running twice because both CutMix and MixUp have it. When it tries to run the second time, getattr(self.learn.loss_func, 'y_int', False) returns False, and the RuntimeError happens.

It can be fixed by changing MixHandler like this:

class MixHandler(Callback):
    "A handler class for implementing `MixUp` style scheduling"
    run_valid = False
    def __init__(self, alpha=0.5):
        self.swapped = False
        self.distrib = Beta(tensor(alpha), tensor(alpha))

    def before_train(self):
        self.stack_y = getattr(self.learn.loss_func, 'y_int', False)
        if self.stack_y and not self.swapped:
            self.old_lf,self.learn.loss_func = self.learn.loss_func,self.lf
            self.swapped = True

    def after_train(self):
        if self.stack_y and self.swapped: 
            self.learn.loss_func = self.old_lf
            self.swapped = False

    def after_cancel_train(self):
        self.after_train()

    def lf(self, pred, *yb):
        if not self.training: return self.old_lf(pred, *yb)
        with NoneReduce(self.old_lf) as lf:
            loss = torch.lerp(lf(pred,*self.yb1), lf(pred,*yb), self.lam)
        return reduce_loss(loss, getattr(self.old_lf, 'reduction', 'mean'))
    
    lf.y_int = True if getattr(learn.loss_func, 'y_int', False) or getattr(lr, 'y_int', False) else False

Can anyone take a look at this and see whether it works on others as well?
And if it does, is it an acceptable style for using Python on fastai standard?
I am new to Python and fastai, so anything does not sound right, please let me know.
Thanks.