ULMFiT vs GPT

I cannot understand the difference between ULMFiT and GPT’s training recipe. Both seem to predict the next word in an autoregressive fashion, that is, predict the next word in a sequence, and I was wondering if there are any major differences? Of course, GPT uses a transformer, but I’m referring specifically to the training step and not the network architecture.

1 Like

Yeah GPT’s main contribution was using a Transformer instead of the LSTM but you wouldn’t be wrong to say that ULMFiT paved the way for the pretraining + fine-tuning approach for language models. :smile:


[Screenshot from the GPT Paper]

They also used a different corpus for the pretraining bit: BooksCorpus while ULMFiT used Wikitext-103

2 Likes

Ah that makes sense, thank you!

1 Like

The ULMFiT handles the sentences sequentially, Where as GPT’s are based on attention mechanism(bidirectional), the GPT’s has maximum input length, whereas the input length for ULMFiT is flexible
The GPT’s are more versatile for tasks