Revisiting ResNets: Improved Training and Scaling Strategies

Hi all,

It looks like with minor architecture tweaks and improved training approaches ResNets are once again pushing the SOTA:

As a lot of the training techniques discussed in the paper are already implemented in fastai, I don’t think it would be too difficult to reproduce these results. Would it be worth updating the defaults based on this?

Additionally, has anyone been using fastai for self supervised learning?