Training Transformer models

Has anyone had any experience training the Transformer/Transformer XL models? I tried to use Transformer for ULM-FiT on the IMDB dataset, and found that the accuracy for training the frozen language model to have about 1/3 the validation accuracy compared to the AWD-LSTM. It may be due to the fact that the model needs much longer to train, but has anyone been able to get good results using Transformers?

The pretrained model isn’t with the same tokenization as fastai, it GPT-1 from openAI, so you should use their tokenization.

1 Like

Ah ok, how should I use their tokenization? And if I use a non-pretrained model, will the default fastai tokenization work?

If you use a non-pretrained model, any tokenization will work yes. It’s just likely the reason you found bad results with the pretrained one.

Thanks for your help! A few last questions: does transformerXL use different tokenization as well? If we want to use pre-trained transformer/transformerXL, how would we go about the process (or will fastai implement those in the future)? I’m assuming I would have to implement their tokenizers and wrap it with a BaseTokenizer?

There is no pretrained model with transformer XL yet. We’ll be releasing one soon, and it’ll be with the default fastai tokenization.


Hey @sgugger,

Is the usage of GPT-1 documented in a notebook in docs_src so I can see a working example (If not I’d love to volunteer)?

Does fastai provide a GPT-1 tokenizer?
I could not locate one in text.transform.

How would one go about using the GPT-1 tokenizer along with the default pretrained model for Transformer to train a language model?

No it’s not documented anywhere, and no one has suggested a working example AFAICT. There is no GPT1-tokenizer, although we have Sentencepiece with BPE.

Hey @sgugger,

Thanks for the response.
If I understand correctly you’re referring to this

What I’m trying to understand is as follows:

Jeremy through the course of his lectures shows a default way of doing a lot of things. Approaches that work out of the box because they’ve been fine-tuned.

I’m trying to build a language model and build a host of classifiers on top of it.
The default setup for a language model is the AWD_LSTM.

However, if I want to try out the Transformer architecture it seems like (from the discussion above) that the default SpacyTokenizer will not give the best results.
So, what Tokenizer should I be using?
I’m not trying Neural Translation, I want to build a domain specific language model (like Jeremy has done in the course for the imdb classifier) using the Transformer arch.

Any news on this (pretrained TransformerXL on fastai tokenizer)? Has this discussion become somewhat irrelevant now that v2 is starting to come out? WIll v2 fix things?

Again the idea would be that I’d love to have a pretrained Transformer architecture to finetune on LM on a specific corpus and then finetune it again on specific task … this is not easily possible with fastai at the moment, correct?
I know other libraries, more specific to NLP, do exist (like huggingface), but ideally this should be doable in fastai as well ! If still relevant I might try to contribute some code!

Also @sgugger I don’t think the fact that the transformer architecture is pretrained on GTP-like tokenization is documented yet, am I right? Again, is it still relevant to file a PR for the updated docs now that v2 is out?

(I have to say I’m not completely sure about what v2 implies :confused: can somebody recommend an overview of how/why/when to switch to fastai v2?)

We have been focused solely on the software development this past few months and didn’t get any time to do research around NLP models. I’d recommend using the transformers from huggingface as they spent more time on this. fastai v2 will make it easy to import their models.

1 Like

Got it, thanks!

When I get something to work I’ll try to post it back here in case it interests somebody but I won’t necessarily make it fastai-compatible as v2 will likely resolve the issue then! Thanks for hard work!

For anyone who’s gonna be coming to this thread in the future, Jeremy tweeted a tutorial of hugginface being integrated with fastai.

This is the tutorial.
Builds from the ground up. Pretty Good.

1 Like

Need help, Please tell us how to save this kind of model? learner.export() is not working its giving following error- PicklingError: Can’t pickle <function at 0x7fdfbffea1e0>: attribute lookup on main failed

I follow this tutorial

i believe the issue is due to using a custom databunch, the learn.export() only works with the databuch methods in the library

So, now what should I do?
I trained my model on colab and created the .pth model. It was working properly on it, but when I started working on a local machine It gave me such error-
“BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.”
This error is occurring when I am trying to create a databunch on a local machine.

How large is your dataset? The BrokenProcessPool is a known bug ([L14] Problem with futures and ProcessPoolExecutor) and happens when you have a large dataset. You can set n_cpus=1 in order to fix this when creating a databunch, but keep in mind that it will be slower then.