ULMFit but with larger transformers-based LLMs?


I tried with success the ulmfit examples in the fastai repository (fastai/ulmfit.ipynb at master · fastai/fastai · GitHub and fastai/38_tutorial.text.ipynb at master · fastai/fastai · GitHub).

I’ve put together a non-trivial training set to run ulmfit on, and things are working reasonably well with the AWD_LSTM model, and it’s fast, although it understandably lacks the coherence of output that something like text-davinci-003 or gpt-3.5-turbo seem to have.

Has anyone here used fastai to fine-tune a LM - not a classifier - using a bigger model like GPT-J, T5, Llama or similar? I am aware of the GPT2 example, but I’m wondering about larger models.

Related: what models would you recommend for finetuning while being able to keep the model local, and runnable on a reasonably modest GPU for inference? (<=12Gb RAM :slight_smile:)


1 Like

I sort of anticipated the “GPU poor” meme here :slight_smile: