Such a good read! Not only does the fastai library give us the tools to (nearly) trivially implement this, but more importantly I found the discussion really illuminating.
I am not sure what Jeremy has in store for us for part 2 (might be we will touch on this at some point), but reading this paper has definitely broadened my understanding! It also references some other papers like the one on Shake-Shake regularization and wider resnets which turned out to be very approachable and very interesting. Now the only challenge will be to actually find time to play with all this
BTW I feel there are a lot of blog posts hiding in this material if anyone would be willing to give it a go.
Hello, @wdhorton in your blog post you said that it’s best to run SWA at the end of a cycle.
But how to do it when using variable cycle length ?
Is there a way to automatically run SWA at the end of a cycle ?
@Benoit_c I’m glad that you’re finding SWA useful. My implementation in fastai doesn’t currently work with cyclical learning rates (at least not the way suggested in the paper, at the end of each cycle). But it should be possible to adapt the SWA callback to be cycle-aware. It would involve changing this line (https://github.com/fastai/fastai/blob/3f2079f7bc07ef84a750f6417f68b7b9fdc9525a/fastai/swa.py#L26) to only update on epochs at the end of the cycle