Lesson 4: Using NT-ASGD for language modelling for IMDB

From the fastai code it seems that momentum based SGD is used for training the language model for AWD_LSTM. But AFAIK the paper recommends not using momentum and instead proposes a NT-ASGD optimiser. Is my understanding correct? If so why is NT-ASGD not used in fastai code?

1 Like