Averaging Weights Leads To Wider Optima - a gem of a paper

I have gone through a relatively bad case of flu and after nearly 2 wks of not being able to do much I am now getting back into the swing of things :slight_smile:

This comes from Jeremy’s Twitter feed from a couple of days ago:
https://arxiv.org/abs/1803.05407

Such a good read! Not only does the fastai library give us the tools to (nearly) trivially implement this, but more importantly I found the discussion really illuminating.

I am not sure what Jeremy has in store for us for part 2 (might be we will touch on this at some point), but reading this paper has definitely broadened my understanding! It also references some other papers like the one on Shake-Shake regularization and wider resnets which turned out to be very approachable and very interesting. Now the only challenge will be to actually find time to play with all this :slight_smile:

BTW I feel there are a lot of blog posts hiding in this material if anyone would be willing to give it a go.

5 Likes

Thank you for sharing

1 Like

I’m working on adding it to the fastai library, and will probably write a blog once I finish writing the code.

6 Likes

Hello,
@wdhorton in your blog post you said that it’s best to run SWA at the end of a cycle.
But how to do it when using variable cycle length ?
Is there a way to automatically run SWA at the end of a cycle ?

BTW, thank you for your implementation !

@Benoit_c I’m glad that you’re finding SWA useful. My implementation in fastai doesn’t currently work with cyclical learning rates (at least not the way suggested in the paper, at the end of each cycle). But it should be possible to adapt the SWA callback to be cycle-aware. It would involve changing this line (https://github.com/fastai/fastai/blob/3f2079f7bc07ef84a750f6417f68b7b9fdc9525a/fastai/swa.py#L26) to only update on epochs at the end of the cycle

1 Like