I came across OptimWrapper
trying to slowly follow @muellerzr’s pytorch to fastai tutorial. Does it do anything but delegate calls to the pytorch optimizer it wraps? I’m trying to replace the code from Jeremy’s pytorch tutorial and I get weird behavior when comparing learn.fit(2)
to running fit()
(the manual training loop from the tutorial):
- If I first call
learn.fit(2)
, I get loss of about 2.2 (same as in the first tutorial), and the loss remains in this region even if I runfit()
later - If I first call
fit
, I get a loss of about 0.2 (an order of magnitude below), which remains in this region even after runninglearn.fit(2)
- If I use
learn.fit(2)
, but use afastai.optimizer.SGD
, I get a loss of about 0.2 as well.
It makes me think that perhaps I’m misunderstanding OptimWrapper
. Any idea?
The code I’m running can be found here (the interesting comparisons are in the 3 last cells, but the setup code is there too): https://colab.research.google.com/drive/1gbTysz2FISa5mv6dnyHHixl1sw1by3bN?usp=sharing