Today's project: refactor sgdr

jeremy · April 16, 2018, 3:02am

I’ve got a rather timely project request if anyone happens to have time in the next 24 hours or so - might be suitable for someone like @sgugger who is already familiar with fastai.sgdr.

The project is to refactor and extend the sgdr/clr stuff into a new class called “OptimScheduler” so that you can, in the most general version, pass a list to each param in the constructor, where each list is equal to the number of “epoch groups”. And the params are basically everything that you could want to change in an optimizer. This is easiest to explain with examples I think!

For instance say you want:

10 epochs of rmsprop, lr linearly increasing from lr=0.1 to lr=1.0, and constant weight decay of 1e-3, and linearly decreasing momentum from 0.95 to 0.85, then:
20 epochs of adam, lr decreasing from 0.5 to 0.1 with cosine annealing, and constant weight decay of 1e04, and beta equal to (0.9,0.99), you’d pass:

[epochs = (10,20), opt_fn=(optim.rmsprop,optim.adam),
    lr=((0.1,1.0), (0.5,0.1)), lr_decay=(decay.linear,decay.cosine),
    momentum=((0.95,0.85), 0.9), momentum_decay=(decay.linear, None),
    beta2=(None,0.99)]

Note that I’ve mapped beta1 for adam to momentum since that’s really what it is, and it simplifies the API. Then beta2 of course maps to the 2nd param of adam’s beta. Stuff that’s not relevant is just None (eg decay, if there’s no decay wanted in that epoch group).

So then with this you should be able to implement SGDR, CLR, and 1cycle all by simply specifying appropriate parameter lists.

The reason I’m keen to have this ASAP is for imagenet training, where the best approaches tend to suggest using rmsprop with linear annealing for a bit, then nesterov momentum sgd with a stepped decay schedule for the rest. It’s awkward to do this with our current approach, and it would be nice to be able to experiment easily with extensions of this.

An extra step which would be really helpful: for each epoch group, also be able to specify a different ModelData object. That way we can call set_data at the start of each group to gradually increase the size of images being used. Now of course set_data is part of Learner, and we don’t model.fit to have to know about fastai.learner, so probably the best way to do this is add an on_epochgroup_start callback to OptimScheduler where we can then do whatever stuff we like.

A suggestion: create a plot_all method that plots everything that’s changing (and for stuff like opt_fn it could just print a list or something, or even put them as overlays on the chart). Would be helpful for debugging.

Anyway, if anyone thinks that sounds interesting and wants to give it a go, let us know in this thread .

sgugger · April 16, 2018, 3:16am

24 hours for all of this? Sounds like a fun challenge

jeremy · April 16, 2018, 3:18am

That’s what I hoped you’d say…

sgugger · April 16, 2018, 6:09pm

I’ve pretty much done the first part I think (need to test that it didn’t break anything now), and here what it would look like:

epoch_groups = [EpochGroup(epochs=1,opt_fn=optim.Adam, lr=(1e-3,1e-2), lr_decay=DecayType.LINEAR, momentum=0.9),
              EpochGroup(epochs=2, opt_fn=optim.Adam, lr=1e-2, lr_decay=DecayType.COSINE, momentum=0)]
learn.fit_opt_sched(epoch_group)

It seemed more natural to me to regroup all the parameters of each group together compared to your instructions.

Then plot_lr() would give:

jeremy · April 16, 2018, 8:08pm

Oh cool! Yeah I wasn’t sure what way round was best - happy for you to make a call on what you think is nicest. I won’t look at this until tomorrow morning - I’m running a couple of imagenet experiments now, which if they work, I’ll try to refactor using this code. If you’re interested in helping with our imagenet project just let me know BTW - we’ve got until the end of this week to nail it!

sgugger · April 16, 2018, 8:46pm

Oh, I’d love to help if I can!

jeremy · April 16, 2018, 9:09pm

Let us know if you want any help at this end taking this API and refactoring the existing classes using it - some of the SF study group are ready to take on a new project now

sgugger · April 16, 2018, 9:30pm

It would be more to check that this new API works as intended.
I’ve actually planned to use the refactoring of the existing classes as a tutorial to show how to use this new API.

jeremy · April 16, 2018, 9:31pm

That sounds great. Well let us know if there’s anything we can help with in the meantime