Training Transformer models

(Max Tian) #1

Has anyone had any experience training the Transformer/Transformer XL models? I tried to use Transformer for ULM-FiT on the IMDB dataset, and found that the accuracy for training the frozen language model to have about 1/3 the validation accuracy compared to the AWD-LSTM. It may be due to the fact that the model needs much longer to train, but has anyone been able to get good results using Transformers?

0 Likes

#2

The pretrained model isn’t with the same tokenization as fastai, it GPT-1 from openAI, so you should use their tokenization.

1 Like

(Max Tian) #3

Ah ok, how should I use their tokenization? And if I use a non-pretrained model, will the default fastai tokenization work?

0 Likes

#4

If you use a non-pretrained model, any tokenization will work yes. It’s just likely the reason you found bad results with the pretrained one.

0 Likes

(Max Tian) #5

Thanks for your help! A few last questions: does transformerXL use different tokenization as well? If we want to use pre-trained transformer/transformerXL, how would we go about the process (or will fastai implement those in the future)? I’m assuming I would have to implement their tokenizers and wrap it with a BaseTokenizer?

0 Likes

#6

There is no pretrained model with transformer XL yet. We’ll be releasing one soon, and it’ll be with the default fastai tokenization.

2 Likes

(Vimarsh Chaturvedi) #7

Hey @sgugger,

Is the usage of GPT-1 documented in a notebook in docs_src so I can see a working example (If not I’d love to volunteer)?

Does fastai provide a GPT-1 tokenizer?
I could not locate one in text.transform.

How would one go about using the GPT-1 tokenizer along with the default pretrained model for Transformer to train a language model?

0 Likes

#8

No it’s not documented anywhere, and no one has suggested a working example AFAICT. There is no GPT1-tokenizer, although we have Sentencepiece with BPE.

0 Likes

(Vimarsh Chaturvedi) #9

Hey @sgugger,

Thanks for the response.
If I understand correctly you’re referring to this https://github.com/google/sentencepiece.

What I’m trying to understand is as follows:

Jeremy through the course of his lectures shows a default way of doing a lot of things. Approaches that work out of the box because they’ve been fine-tuned.

I’m trying to build a language model and build a host of classifiers on top of it.
The default setup for a language model is the AWD_LSTM.

However, if I want to try out the Transformer architecture it seems like (from the discussion above) that the default SpacyTokenizer will not give the best results.
So, what Tokenizer should I be using?
I’m not trying Neural Translation, I want to build a domain specific language model (like Jeremy has done in the course for the imdb classifier) using the Transformer arch.

0 Likes