Language Model Zoo 🦍

ULMFit for Punjabi

SOTA for Language Modeling and Classifier

New Dataset for Punjabi Text Classification Challenges:

Please open a Github issue !

2 Likes

I’ve also trained a language model and classifier for Hindi, achieving a perplexity of ~35 on 20% validation set of 55k Hindi Wikipedia articles. I’m using Fastai v1 and Sentencepiece for Tokenization. I would like to compare our models on the BBC News classification dataset. Would you mind sharing your score?

@disisbig can you make a thread for you language and put it into the top entry? Re comparison we are in process of assembling the language models in one repository to ensure reproductability. https://github.com/n-waves/ulmfit-multilingual Do you want to contribute your lm and hyper Parmas?

Thanks @piotr.czapla. I’ve created the threads for Hindi and Punjabi. I’ll soon raise a PR to contribute my models and hyper-params to ulmfit-multilingual

1 Like

Folks, would anyone know if one can use a language model (instead of word vecs) for sequence 2 sequence translation? Think Jeremy mentioned that in previous deep learnng part II in lesson 11 where he demoed translation wird word vecs.

Not sure I got this correct and its possible, pointers welcome.

@martijnd

Hi! I have trained another model for the Russian language using Taiga corpus: ULMFiT - Russian

the transformer + transformerxl can be used for that. see the paper attention is all you need

1 Like

It is possible, but you need to define your own decoder on top of the hidden states returned by the language model.

1 Like

Which paper, could you share a link ?

1 Like

Thx, so this has a language model or it a good way to make translations without RNNs?

the breakthrough in this paper is that it is not a RNN. RNNs takes a long time to train and have issues with translating long sentences. I have been training RNNs where it tok 15 hour to process 10 epochs on 2.5e8 tokens. The awd_lstm rrn in fastai is very interesting as a model it just requires a lot of patience to train

The same perplexity/accuracy can be reached in about an hour using the transformerXL @sgugger implemented recently. it handles long sentences much more elegantly (attention mechanism) and can be parallelized.

In short - if you want to train languagemodels for translation or classification etc. you will do it faster an better using the transformerXL model.

1 Like

If you manage to do that, please tell me how. Those models are heavy and require a much longer time to train! Those it’s true it takes less epochs to reach a ppl as low as the AWD-LSTM on WT103, it still takes more compute time.

2 Likes

thx @Kaspar got it ! I will take a look at it then for a pet project I want to do.

I am still looking for simple example code (afaict there are no examples in fast.ai notebooks) how to use a language model for translations, I just saw fast.ai examples from last term part II (lesson 11) using word vecs. @sgugger

I have see this phenomenon several times when training TransformerlXL. Do you @sgugger have any idea what is going on?

image

That would suggest a too high learning rate since it just breaks.

ok but to be more precise why does it occure “between” 2 epochs ?

Why would it occur at a specific time? I’m confused about what you’re asking.

1 Like

Do you mean that you get the crash when epoch starts again?
Does it do that each time?

Interesting idea to use language modeling to do transfer learning to machine translation task.
Some ppl tried that with BERT and they failed. https://github.com/huggingface/pytorch-pretrained-BERT/issues/31

There are other concepts like back translation that let you use large monolingual texts in MT, and we know that they are working well. Check http://nlpprogress.com/ Sebastian is listing there the recent approaches to MT.

4 Likes

thx for this link @piotr.czapla!