In Chapter 10, Jeremy says:
By training another model on all the texts read backward and averaging the predictions of those two models, we can even get to 95.1% accuracy, which was the state of the art introduced by the ULMFiT paper.
I’m assuming that AWD_LSTM
is a pre-trained language model for texts that are read “forward”. Is there a pre-trained language model for texts that are read backward?
Or do we have to create one from scratch (to use the above averaging approach)?