Oh, I thought you were talking about an alternative loading method. As you pointed out the param duplication, I looked more closely the PyTorch optimizer implementation and made a few changes to my implementation.
It should solve the duplication issue as well as improve support for base optimizer method support (so thatâs itâs a proper wrapper )
Let me know if the commit helps!
Most of the Lookahead / Ranger impl have some issues with state dict save/load and also adding parameters after Optimizer creation via add_param_group causing a crash.
Yes, I figured it would give more coherence to the implementation since we inherit from Optimizer. And it allows us to use inherited methods to reduce code base The only method that Iâm not overwriting is a private one to ensure smooth param_group addition for slow weights.
Also, you might want to check the discussion on this thread. I added a param synchronization method for external calls so that users can choose the state they want for their model to be evaluated. My current version is available here!
Your procedure to use ranger is not working fro me i am getting Typeerror:module object is not callable when i am running this line â>optar=partial(Ranger).
Using Ranger on my model, I did a save/load , and start the training again. And the training loss behaves in a completely different way after this save/load step (training speed decreases).
Anyone has the same problem ? I think the saved optimizer is not saved correctly. I have better performance if I specify with_opt=False when I load the model.
I didnât get a chance to test it, but I believe the fix is to simply leverage what @rwightman did and move the slow weights into a state param group (which as usual, is brilliant coding by him).
That way they are re-loaded properly and should correct this issue.
Will try and do that tomorrow but at least I believe I know the issue and by copying @rwightman excellent idea, should fix.
I think the optimizer does not work with pretrained models when the model has different layer groups. For some reason, it stops after one epoch. Could you please look into this?
Iâve posted a new version of Ranger - it has improved support for layer groups and is a much tighter codebase all around (one pass handling at param level, no repeat loops, slow weights moved into state dict, etc).
Can you please see if that resolves your issue?
New version 9.3.19
*Also thanks to @rwightman as I leveraged some of his code ideas related to putting the slow weights into a state dictionary vs how lonepatient originally did it.
Iâm working to integrate @fgfm idea regarding partial sync as well next.
We have a new fit function, fit_fc. Grab the most recent version via dev install of the library to use otherwise I believe there was a hack to allow for one cycle to run in a way similar to fit_fc
I am trying the new version right now, but I think there may be a few bugs. I am trying this with a pretrained ResNet50.
I first got a KeyError: 'k' so I changed group['k'] to self.k. I am unsure if that is the right fix. It was then running for k steps and then said `KeyError: âslow_bufferâ. I am not sure whatâs going on here.
Please let me know if you need more information and if you have a fix.