Is this an issue that would have affect fastai v1 in the past?
v1 was using the Adam optimizer and doing decoupled weight decay on its own for AdamW, so it would the issue would be there.
I wouldn’t go as far as saying you need to retrain all the models though.
1 Like
Ok sounds good.
Yes retraining all our models is probably an overkill, but this may be more important when trying to replicate a paper for example.