Performance degradation between fastai 2.2.5 and 2.2.7

NathanHub · February 24, 2021, 11:49am

I’ve just upgraded fastai to 2.2.7 from 2.2.5 and noticed a huge change in performance. However, by looking at the changelog, nothing important has changed between the two versions.

Here is a notebook showing the difference. But basically, the exact same code goes from 93% accuracy down to 47%.

Have anyone noticed something similar ? @muellerzr any idea of what’s going on ?

Malles · February 26, 2021, 8:17pm

Hi,
I noticed the the same behaviour.
The issue was easily reproduceable for me (in a colab-notebook): I loaded the notebook from the course-books chapter 5 (pet breeds) with the current fastai 2.2.7. Then I executed the the first “fine_tune” in this chapter; the error_rates were far higher than in the book.
Then I tried the same with fastai 2.2.5: In this case the error_rates were similar to the rates in the book

muellerzr · February 26, 2021, 8:53pm

This is indeed an error, thanks! Looking into this now

khvm · March 3, 2021, 8:24am

YES!! I have this issue! My accuracy dropped from 95% to merely 25%… Even if i load the model learned previously to new version of FastAI and they perform badly too.

I read through my code, and it’s so strange that the same code, with a different version performs differently. I check the release note and file changed, but there’s not much change? I use Colab and my own GTX1080Ti machine and both report the degradation.

khvm · March 3, 2021, 8:36am

It seems that 2.2.5 is indeed 2.2.3 because in 2.2.6, the change is that 2.2.5 was not released correctly - it was actually 2.2.3. Tracing back from 2.2.3, I dont see any breaking change in the core function for vision.

khvm · March 4, 2021, 8:38am

Quick update here. I run through 2.2.2 to the current one and found that everything was top-notch at 2.2.2, therefore I believe the error is within 2.2.3.

muellerzr · March 5, 2021, 7:24pm

The issue is specifically in the DataLoaders, as below loads in DataLoaders generated from v 2.2.5, and trains in 2.2.7:

It makes me wonder if it has to do with this PR, but that’s as far as I got today

cristian.c · March 7, 2021, 6:28pm

I was able to repro this issue as well in Colab: taking for example 05_pet_breeds.ipynb, when I call learn.fine_tune(2) the error_rate is about 0.20. Restarting the runtime and installing fastai 2.2.3 (by changing the pip install command to pip install -Uqq fastbook fastai==2.2.3 ) solved the problem and the error_rate is back to 0.07, like what we’ve seen in the course.

luldry · March 11, 2021, 6:44am

Hello!
Thanks for looking into this issue - do you know if there is a fix, or an upcoming upgrade to the package?
Thank you!

knowever · March 15, 2021, 12:06am

I think I am getting a related error with DataLoaders. I found it when attempting to run fastai/dev_nbs/course/lesson7-resnet-mnist.ipynb.
When I run the notebook to line

dls = dsets.dataloaders(bs=bs, after_item=tfms, after_batch=[IntToFloatTensor, Normalize])

I think this was introduce with fix https://github.com/fastai/fastai/pull/3178

I an getting the following error

AttributeError Traceback (most recent call last)
in
----> 1 dls = dsets.dataloaders(bs=bs, after_item=tfms, after_batch=[IntToFloatTensor, Normalize])

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in dataloaders(self, bs, shuffle_train, shuffle, val_shuffle, n, path, dl_type, dl_kwargs, device, drop_last, val_bs, **kwargs)
222 dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))
223 for i in range(1, self.n_subsets)]
–> 224 return self._dbunch_type(*dls, path=path, device=device)
225
226 FilteredBase.train,FilteredBase.valid = add_props(lambda i,x: x.subset(i))

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in init(self, path, device, *loaders)
142 def init(self, *loaders, path=’.’, device=None):
143 self.loaders,self.path = list(loaders),Path(path)
–> 144 if device is not None or hasattr(loaders[0],‘to’): self.device = device
145
146 def getitem(self, i): return self.loaders[i]

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in device(self, d)
158 @device.setter
159 def device(self, d):
–> 160 for dl in self.loaders: dl.to(d)
161 self._device = d
162

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in to(self, device)
122 self.device = device
123 for tfm in self.after_batch.fs:
–> 124 for a in L(getattr(tfm, ‘parameters’, None)): setattr(tfm, a, getattr(tfm, a).to(device))
125 return self
126

AttributeError: ‘NoneType’ object has no attribute ‘to’

I have been looking at the code and I think it might be the same issue for you too. It seem the addition of kwargs is the problem.

           dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))

If removed from def dataloaders like below

           dls = [dl] + [dl.new(self.subset(i), **merge(def_kwargs,val_kwargs,dl_kwargs[i]))

If I am understanding change correctly it will run the
after_batch=[IntToFloatTensor, Normalize] on both, and before it did the first not second.

I am not sure if this is correct to change code like I did.

knowever · March 15, 2021, 12:14am

Here is the pic to clarify change.

marii · March 17, 2021, 11:55am

As you have found the validation set is normalized(min0 max1) not standardized (mean 0 std 1).
There is no normalization in valid Pipeline: IntToFloatTensor {‘div’: 255.0, ‘div_mask’: 1}
_add_norm adds a normalize callback to both train and valid in previous versions, not to valid in most recent version. https://github.com/fastai/fastai/blob/57106212a842ec6ecc0a9a4daac950e29ff029ad/fastai/vision/learner.py#L153

From fastai 2.2.5
ipdb> id(dls.train.after_batch.fs)
140096334685072
ipdb> id(dls.valid.after_batch.fs)
140096334685072
in previous version the valid/train were pointing to the same list for their transforms.

github.com

fastai/fastai/blob/e4303fbb68652a7fe05f28877e76a299f3078ec4/fastai/data/core.py#L222


            shuffle=shuffle_train
            warnings.warn('`shuffle_train` is deprecated. Use `shuffle` instead.',DeprecationWarning)
        if device is None: device=default_device()
        if dl_kwargs is None: dl_kwargs = [{}] * self.n_subsets
        if dl_type is None: dl_type = self._dl_type
        if drop_last is None: drop_last = shuffle
        val_kwargs={k[4:]:v for k,v in kwargs.items() if k.startswith('val_')}
        def_kwargs = {'bs':bs,'shuffle':shuffle,'drop_last':drop_last,'n':n,'device':device}
        dl = dl_type(self.subset(0), **merge(kwargs,def_kwargs, dl_kwargs[0]))
        def_kwargs = {'bs':bs if val_bs is None else val_bs,'shuffle':val_shuffle,'n':None,'drop_last':False}
        dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))
                      for i in range(1, self.n_subsets)]
        return self._dbunch_type(*dls, path=path, device=device)
FilteredBase.train,FilteredBase.valid = add_props(lambda i,x: x.subset(i))
# Cell
class TfmdLists(FilteredBase, L, GetAttr):
    "A `Pipeline` of `tfms` applied to a collection of `items`"
    _default='tfms'
    def __init__(self, items, tfms, use_list=None, do_setup=True, split_idx=None, train_setup=True,

specifically **merge(kwargs,…), adding in the same kwargs caused a new list of transforms to be created for dls.valid.after_batch.fs instead of using the same list of transforms as dls.train.after_batch.fs. I “think” the correct thing is to add normalization in cnn_learner for both the train and validation set instead of correcting the issue at the root cause, as I think it is odd behavior adding a transform to dls.train.after_batch to also add it to dls.valid.after_batch, so that you cannot have different after_batch transforms for the training and validation sets. MY implementation of this here: https://github.com/fastai/fastai/pull/3268
This is edited from discord conversation, if anything is not clear feel free to ask.

sam_mat · April 16, 2021, 5:23pm

Hi,

Can I ask if the developers are taking care of it, so that the new version can fix this error?

Sam

muellerzr · April 16, 2021, 5:59pm

The PR was merged so yes