Reproducing SOTA Commonsense Reasoning Result in with a OpenAI’s Pretrained Transformer Language Model

Hi everybody! I’ve been doing some work with the Transformer architecture for the last few months. I posted in this forum about a result I was excited about on WT2. Unfortunately, that was an erroneous result, but I’m getting closer to good WT103 language models with the Transformer in to hopefully compare to ULMFiT in a couple different ways.

Anyway, as a quick but related project, I figured out how to load the pretrained weights from OpenAI’s Transformer (which was trained on BooksCorpus and implemented in Tensorflow) and reproduce their SOTA result on the ROCStories commonsense reasoning dataset. Additionally, I tried to quickly apply the model to IMDB to compare to ULMFiT on a classification task, but couldn’t get a good result quickly and don’t have the time to pursue this particular direction. So, I wrote a blog post and published my Jupyter notebooks on GitHub in case anyone has an interest in the code or trying to compare it to ULMFiT. I have a couple ideas for why the result on IMDB was so bad that I detailed toward the end of my post.

Let me know if you have any questions or comments!


A couple people had discussed this with me or expressed interest in my work with the Transformer on WT-2 when I originally posted about it, so I figured I’d mention them here in case they’re interested.

@sgugger @jeremy @sebastianruder @snaik @arank @Even @keratin

Also cc’ing @mcleavey who is looking at Transformer at OpenAI now.

1 Like

The following might be of interest to the group as well:

The paper outlines how they combine images with language and show improved performance through multiple pretraining steps (aka transfer learning) on data that’s heterogeneous. It’s a really interesting adaptation of the architecture to include images.