Averaging Weights Leads To Wider Optima - a gem of a paper

radek · March 22, 2018, 9:14am

I have gone through a relatively bad case of flu and after nearly 2 wks of not being able to do much I am now getting back into the swing of things

This comes from Jeremy’s Twitter feed from a couple of days ago:
https://arxiv.org/abs/1803.05407

Such a good read! Not only does the fastai library give us the tools to (nearly) trivially implement this, but more importantly I found the discussion really illuminating.

I am not sure what Jeremy has in store for us for part 2 (might be we will touch on this at some point), but reading this paper has definitely broadened my understanding! It also references some other papers like the one on Shake-Shake regularization and wider resnets which turned out to be very approachable and very interesting. Now the only challenge will be to actually find time to play with all this

BTW I feel there are a lot of blog posts hiding in this material if anyone would be willing to give it a go.

rpathak · March 22, 2018, 9:28am

Thank you for sharing

wdhorton · March 22, 2018, 3:20pm

I’m working on adding it to the fastai library, and will probably write a blog once I finish writing the code.

Benoit_c · September 29, 2018, 7:26pm

Hello,
@wdhorton in your blog post you said that it’s best to run SWA at the end of a cycle.
But how to do it when using variable cycle length ?
Is there a way to automatically run SWA at the end of a cycle ?

BTW, thank you for your implementation !

wdhorton · September 30, 2018, 3:42am

@Benoit_c I’m glad that you’re finding SWA useful. My implementation in fastai doesn’t currently work with cyclical learning rates (at least not the way suggested in the paper, at the end of each cycle). But it should be possible to adapt the SWA callback to be cycle-aware. It would involve changing this line (https://github.com/fastai/fastai/blob/3f2079f7bc07ef84a750f6417f68b7b9fdc9525a/fastai/swa.py#L26) to only update on epochs at the end of the cycle