Hi everybody! I’ve been doing some work with the Transformer architecture for the last few months. I posted in this forum about a result I was excited about on WT2. Unfortunately, that was an erroneous result, but I’m getting closer to good WT103 language models with the Transformer in fast.ai to hopefully compare to ULMFiT in a couple different ways.
Anyway, as a quick but related project, I figured out how to load the pretrained weights from OpenAI’s Transformer (which was trained on BooksCorpus and implemented in Tensorflow) and reproduce their SOTA result on the ROCStories commonsense reasoning dataset. Additionally, I tried to quickly apply the model to IMDB to compare to ULMFiT on a classification task, but couldn’t get a good result quickly and don’t have the time to pursue this particular direction. So, I wrote a blog post and published my Jupyter notebooks on GitHub in case anyone has an interest in the code or trying to compare it to ULMFiT. I have a couple ideas for why the result on IMDB was so bad that I detailed toward the end of my post.
Let me know if you have any questions or comments!