What is the v2 equivalent of "AccumulateScheduler"?

Just do like me: in the notebook 10_nlp.ipynb of Jeremy, put cbs=GradientAccumulation() in learn.fit_one_cycle(). You should observe the huge running training loss (see my first screen shots)…

In fact, when you go until the end of the learner training (always the same example: the notebook 10_nlp.ipynb, see my second screen shot), you will observe that the valid loss and accuracy are right (compared to what Jeremy got) but the training loss is well high.

What I think:

  • GradientAccumulation() works well.
  • but… the running training loss up to the final one shows a true value but not the average one

How to correct the last point?