Lesson 4: Using NT-ASGD for language modelling for IMDB

bharath.bhushan · June 27, 2019, 5:34am

From the fastai code it seems that momentum based SGD is used for training the language model for AWD_LSTM. But AFAIK the paper recommends not using momentum and instead proposes a NT-ASGD optimiser. Is my understanding correct? If so why is NT-ASGD not used in fastai code?