Congrats on those results @LessW2020 @muellerzr !
Hopefully I may have something for you to get slightly better results. In case you guys are using Lookahead (even combined version), right before evaluation, there is a decision that should be made:
-
At the end of an epoch, most likely
nb_batches % k != 0
. Which means, that you are evaluating your model on your fast weights (before the next synchronization). -
The difference might be slim but positive as there are two choices right before evaluation: copy slow weights to fast weights (walking a few steps back), or perform synchronization even though you haven’t yet performed k fast steps since last sync.
I’m still investigating which option is giving the best results but at least, it’s better to have the choice. You can find the method I implemented in commit, that could be used as follows:
from torch.optim import Adam
optimizer = Adam(model_params)
optimizer = Lookahead(optimizer, sync_rate=0.5, sync_period=6)
for _ in range(nb_epochs):
# Train here
optimizer.sync_params()
# not specifying sync_rate means model params <- slow params
# otherwise optimizer.sync_params(0.5) will force early synchronization
# Evaluate here
Hope this helps!
Cheers