Performance degradation between fastai 2.2.5 and 2.2.7

I’ve just upgraded fastai to 2.2.7 from 2.2.5 and noticed a huge change in performance. However, by looking at the changelog, nothing important has changed between the two versions.

Here is a notebook showing the difference. But basically, the exact same code goes from 93% accuracy down to 47%.

Have anyone noticed something similar ? @muellerzr any idea of what’s going on ?

3 Likes

Hi,
I noticed the the same behaviour.
The issue was easily reproduceable for me (in a colab-notebook): I loaded the notebook from the course-books chapter 5 (pet breeds) with the current fastai 2.2.7. Then I executed the the first “fine_tune” in this chapter; the error_rates were far higher than in the book.
Then I tried the same with fastai 2.2.5: In this case the error_rates were similar to the rates in the book

1 Like

This is indeed an error, thanks! Looking into this now

YES!! I have this issue! My accuracy dropped from 95% to merely 25%… Even if i load the model learned previously to new version of FastAI and they perform badly too.

I read through my code, and it’s so strange that the same code, with a different version performs differently. I check the release note and file changed, but there’s not much change? I use Colab and my own GTX1080Ti machine and both report the degradation.

It seems that 2.2.5 is indeed 2.2.3 because in 2.2.6, the change is that 2.2.5 was not released correctly - it was actually 2.2.3. Tracing back from 2.2.3, I dont see any breaking change in the core function for vision.

Quick update here. I run through 2.2.2 to the current one and found that everything was top-notch at 2.2.2, therefore I believe the error is within 2.2.3.

The issue is specifically in the DataLoaders, as below loads in DataLoaders generated from v 2.2.5, and trains in 2.2.7:

It makes me wonder if it has to do with this PR, but that’s as far as I got today

1 Like

I was able to repro this issue as well in Colab: taking for example 05_pet_breeds.ipynb, when I call learn.fine_tune(2) the error_rate is about 0.20. Restarting the runtime and installing fastai 2.2.3 (by changing the pip install command to pip install -Uqq fastbook fastai==2.2.3 ) solved the problem and the error_rate is back to 0.07, like what we’ve seen in the course.

2 Likes

Hello!
Thanks for looking into this issue - do you know if there is a fix, or an upcoming upgrade to the package?
Thank you!

I think I am getting a related error with DataLoaders. I found it when attempting to run fastai/dev_nbs/course/lesson7-resnet-mnist.ipynb.
When I run the notebook to line

dls = dsets.dataloaders(bs=bs, after_item=tfms, after_batch=[IntToFloatTensor, Normalize])

I think this was introduce with fix https://github.com/fastai/fastai/pull/3178

I an getting the following error


AttributeError Traceback (most recent call last)
in
----> 1 dls = dsets.dataloaders(bs=bs, after_item=tfms, after_batch=[IntToFloatTensor, Normalize])

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in dataloaders(self, bs, shuffle_train, shuffle, val_shuffle, n, path, dl_type, dl_kwargs, device, drop_last, val_bs, **kwargs)
222 dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))
223 for i in range(1, self.n_subsets)]
–> 224 return self._dbunch_type(*dls, path=path, device=device)
225
226 FilteredBase.train,FilteredBase.valid = add_props(lambda i,x: x.subset(i))

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in init(self, path, device, *loaders)
142 def init(self, *loaders, path=’.’, device=None):
143 self.loaders,self.path = list(loaders),Path(path)
–> 144 if device is not None or hasattr(loaders[0],‘to’): self.device = device
145
146 def getitem(self, i): return self.loaders[i]

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in device(self, d)
158 @device.setter
159 def device(self, d):
–> 160 for dl in self.loaders: dl.to(d)
161 self._device = d
162

/opt/conda/lib/python3.7/site-packages/fastai/data/core.py in to(self, device)
122 self.device = device
123 for tfm in self.after_batch.fs:
–> 124 for a in L(getattr(tfm, ‘parameters’, None)): setattr(tfm, a, getattr(tfm, a).to(device))
125 return self
126

AttributeError: ‘NoneType’ object has no attribute ‘to’


I have been looking at the code and I think it might be the same issue for you too. It seem the addition of kwargs is the problem.

           dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))

If removed from def dataloaders like below

           dls = [dl] + [dl.new(self.subset(i), **merge(def_kwargs,val_kwargs,dl_kwargs[i]))

If I am understanding change correctly it will run the
after_batch=[IntToFloatTensor, Normalize] on both, and before it did the first not second.

I am not sure if this is correct to change code like I did.

Here is the pic to clarify change.

As you have found the validation set is normalized(min0 max1) not standardized (mean 0 std 1).
There is no normalization in valid Pipeline: IntToFloatTensor {‘div’: 255.0, ‘div_mask’: 1}
_add_norm adds a normalize callback to both train and valid in previous versions, not to valid in most recent version. https://github.com/fastai/fastai/blob/57106212a842ec6ecc0a9a4daac950e29ff029ad/fastai/vision/learner.py#L153

From fastai 2.2.5
ipdb> id(dls.train.after_batch.fs)
140096334685072
ipdb> id(dls.valid.after_batch.fs)
140096334685072
in previous version the valid/train were pointing to the same list for their transforms.


specifically **merge(kwargs,…), adding in the same kwargs caused a new list of transforms to be created for dls.valid.after_batch.fs instead of using the same list of transforms as dls.train.after_batch.fs. I “think” the correct thing is to add normalization in cnn_learner for both the train and validation set instead of correcting the issue at the root cause, as I think it is odd behavior adding a transform to dls.train.after_batch to also add it to dls.valid.after_batch, so that you cannot have different after_batch transforms for the training and validation sets. MY implementation of this here: https://github.com/fastai/fastai/pull/3268
This is edited from discord conversation, if anything is not clear feel free to ask.
3 Likes

Hi,

Can I ask if the developers are taking care of it, so that the new version can fix this error?

Sam

The PR was merged so yes