Why AWD-LSTM for language model?

Briefly during Lesson 4 there was a slight mention that although the job of the language model is to predict the next coming words, that it wasn’t great for text generation given a seed of words. Jeremy said that there are better ways of doing text generation, which I’m assuming is using variants of GANs. My question is if a GAN is better at text generation for human perception, intuitively shouldn’t that mean that it is a “better” language model at it’s core?

Is the reason we don’t use a GAN (or whatever other method it is) methodology for the backbone language model because the underlying architecture doesn’t let us extract the encoder for transfer learning? Or is there something else that my intuition is off that it isn’t actually a better “language model”?


Bump. Would love to hear the answer to this.