Training Transformer models

maxmatical · May 9, 2019, 5:26pm

Has anyone had any experience training the Transformer/Transformer XL models? I tried to use Transformer for ULM-FiT on the IMDB dataset, and found that the accuracy for training the frozen language model to have about 1/3 the validation accuracy compared to the AWD-LSTM. It may be due to the fact that the model needs much longer to train, but has anyone been able to get good results using Transformers?

sgugger · May 9, 2019, 6:05pm

The pretrained model isn’t with the same tokenization as fastai, it GPT-1 from openAI, so you should use their tokenization.

maxmatical · May 9, 2019, 6:39pm

Ah ok, how should I use their tokenization? And if I use a non-pretrained model, will the default fastai tokenization work?

sgugger · May 9, 2019, 6:44pm

If you use a non-pretrained model, any tokenization will work yes. It’s just likely the reason you found bad results with the pretrained one.

maxmatical · May 9, 2019, 7:03pm

Thanks for your help! A few last questions: does transformerXL use different tokenization as well? If we want to use pre-trained transformer/transformerXL, how would we go about the process (or will fastai implement those in the future)? I’m assuming I would have to implement their tokenizers and wrap it with a BaseTokenizer?

sgugger · May 9, 2019, 7:26pm

There is no pretrained model with transformer XL yet. We’ll be releasing one soon, and it’ll be with the default fastai tokenization.

chatuur · September 11, 2019, 3:21pm

Hey @sgugger,

Is the usage of GPT-1 documented in a notebook in docs_src so I can see a working example (If not I’d love to volunteer)?

Does fastai provide a GPT-1 tokenizer?
I could not locate one in text.transform.

How would one go about using the GPT-1 tokenizer along with the default pretrained model for Transformer to train a language model?

sgugger · September 11, 2019, 4:01pm

No it’s not documented anywhere, and no one has suggested a working example AFAICT. There is no GPT1-tokenizer, although we have Sentencepiece with BPE.

chatuur · September 12, 2019, 5:39am

Hey @sgugger,

Thanks for the response.
If I understand correctly you’re referring to this https://github.com/google/sentencepiece.

What I’m trying to understand is as follows:

Jeremy through the course of his lectures shows a default way of doing a lot of things. Approaches that work out of the box because they’ve been fine-tuned.

I’m trying to build a language model and build a host of classifiers on top of it.
The default setup for a language model is the AWD_LSTM.

However, if I want to try out the Transformer architecture it seems like (from the discussion above) that the default SpacyTokenizer will not give the best results.
So, what Tokenizer should I be using?
I’m not trying Neural Translation, I want to build a domain specific language model (like Jeremy has done in the course for the imdb classifier) using the Transformer arch.

marco_b · October 11, 2019, 5:36pm

Any news on this (pretrained TransformerXL on fastai tokenizer)? Has this discussion become somewhat irrelevant now that v2 is starting to come out? WIll v2 fix things?

Again the idea would be that I’d love to have a pretrained Transformer architecture to finetune on LM on a specific corpus and then finetune it again on specific task … this is not easily possible with fastai at the moment, correct?
I know other libraries, more specific to NLP, do exist (like huggingface), but ideally this should be doable in fastai as well ! If still relevant I might try to contribute some code!

Also @sgugger I don’t think the fact that the transformer architecture is pretrained on GTP-like tokenization is documented yet, am I right? Again, is it still relevant to file a PR for the updated docs now that v2 is out?

(I have to say I’m not completely sure about what v2 implies can somebody recommend an overview of how/why/when to switch to fastai v2?)

sgugger · October 12, 2019, 4:20pm

We have been focused solely on the software development this past few months and didn’t get any time to do research around NLP models. I’d recommend using the transformers from huggingface as they spent more time on this. fastai v2 will make it easy to import their models.

marco_b · October 12, 2019, 4:33pm

Got it, thanks!

When I get something to work I’ll try to post it back here in case it interests somebody but I won’t necessarily make it fastai-compatible as v2 will likely resolve the issue then! Thanks for hard work!

chatuur · December 6, 2019, 1:55pm

For anyone who’s gonna be coming to this thread in the future, Jeremy tweeted a tutorial of hugginface being integrated with fastai.

This is the tutorial.
Builds from the ground up. Pretty Good.

dipesh_pal · December 27, 2019, 3:50pm

Need help, Please tell us how to save this kind of model? learner.export() is not working its giving following error- PicklingError: Can’t pickle <function at 0x7fdfbffea1e0>: attribute lookup on main failed

I follow this tutorial https://towardsdatascience.com/fastai-with-transformers-bert-roberta-xlnet-xlm-distilbert-4f41ee18ecb2

maxmatical · December 29, 2019, 5:59pm

i believe the issue is due to using a custom databunch, the learn.export() only works with the databuch methods in the library

dipesh_pal · December 30, 2019, 6:26am

So, now what should I do?
I trained my model on colab and created the .pth model. It was working properly on it, but when I started working on a local machine It gave me such error-
“BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.”
This error is occurring when I am trying to create a databunch on a local machine.

Waterpas · April 14, 2020, 2:41pm

How large is your dataset? The BrokenProcessPool is a known bug ([L14] Problem with futures and ProcessPoolExecutor) and happens when you have a large dataset. You can set n_cpus=1 in order to fix this when creating a databunch, but keep in mind that it will be slower then.