Was this an issue in fastai v1?

Is this an issue that would have affect fastai v1 in the past?

v1 was using the Adam optimizer and doing decoupled weight decay on its own for AdamW, so it would the issue would be there.
I wouldn’t go as far as saying you need to retrain all the models though.

1 Like

Ok sounds good.

Yes retraining all our models is probably an overkill, but this may be more important when trying to replicate a paper for example.