NLP after GPT-3

The course v4 is amazing, but it leaves me wondering what the current state of affairs is post-GPT-3 when it comes to NLP and language modeling in particular with fastai. Will there ever be something where we can get close to creating SoTA models with fastai when GPT-3 and similar large models are dominating? I am increasingly worried about a world where you cannot compete unless you work at one of these very large institutions. Are there any plans to address this within this community and how do you all feel about it?

Sorry for this rather vague post, but I think this is an important topic.


We already have an integration guide with GPT-2 and other HF models, if you choose to go that route (Sylvain showed a guide).

Also: there is a large difference between SOTA and Practical SOTA. GPT-3 takes a few hundred thousand dollars to train (IIRC? Could be millions and I’m off a few zeros)


Thanks for your answer, Zachary. I agree we should be striving for practical SOTA. That is, in my opinion, what makes this community so great.

Looking at the interesting applications we have seen so far using GPT-3, can something similar be built on fastai today and if not, what are we missing?

GPT-3’s model isn’t released (the weights). What’s happening here is they’re all using an API that OpenAI setup for folks to utilize (IIRC). So i would say this wouldn’t apply here. To use fastai would be to train from scratch.

Sorry, I should have been more clear. I don’t mean to actually recreate applications like these using GPT-3, specifically. I mean how do we move in this general direction if models like GPT-3 aren’t being released.

EDIT: Also, actually running inference with GPT-3 might not be feasible because of the size.

Just focus on what is and what’s practical. fastai and HF, with all its flavors is good enough. Look towards papers that seek to reduce complexity while increasing accuracy (looking at you X-Bert) as well as those focused on transfer Learning rather than the ground up. The answer should not be a larger model. As then it becomes completely unfeasible. This trend should be going away soon of just throwing as many Params as we can before we get back to okay, but what can I use?

Also, in the GPT world, Sylvain did make a guide for training GPT2 :slight_smile:


Thanks again. I will dive into that then.