ULMFiT vs GPT

YY1401 · October 6, 2022, 11:55am

I cannot understand the difference between ULMFiT and GPT’s training recipe. Both seem to predict the next word in an autoregressive fashion, that is, predict the next word in a sequence, and I was wondering if there are any major differences? Of course, GPT uses a transformer, but I’m referring specifically to the training step and not the network architecture.

jimmiemunyi · October 7, 2022, 1:07pm

Yeah GPT’s main contribution was using a Transformer instead of the LSTM but you wouldn’t be wrong to say that ULMFiT paved the way for the pretraining + fine-tuning approach for language models.

[Screenshot from the GPT Paper]

They also used a different corpus for the pretraining bit: BooksCorpus while ULMFiT used Wikitext-103

YY1401 · October 8, 2022, 11:54am

Ah that makes sense, thank you!

Tarun1999 · July 3, 2024, 5:32pm

The ULMFiT handles the sentences sequentially, Where as GPT’s are based on attention mechanism(bidirectional), the GPT’s has maximum input length, whereas the input length for ULMFiT is flexible
The GPT’s are more versatile for tasks