In case someone will be interested in the future. On the Russian language the finetuning of language model with the same methodology as in leeson3-imdb.ipynb achieved the best result in all my experiments for now.
Another couple of questions:
In my intuition, we can achieve better result if we finetune language model on domain specific data with more training examples. In your experiments how big were domain specific corpuses?
Dose someone try max vocab of 100000 or more for LM finetuning step?
On wikitext-103 the model trains in ±18h on 1080TI
100k is huge, it makes it hard for model to learn useful relations between words for Russian you may want to use SentencePiece with 25k tokens, it works really well for Polish (better than sentence piece with 50k tokens, way better than 100k tokens).
You may check our paper & presentation there is an example that show how a different number of tokens influence the way a random sentence is being split.
I’ve also trained a language model and classifier for Hindi, achieving a perplexity of ~35 on 20% validation set of 55k Hindi Wikipedia articles. I’m using Fastai v1 and Sentencepiece for Tokenization. I would like to compare our models on the BBC News classification dataset. Would you mind sharing your score?
@disisbig can you make a thread for you language and put it into the top entry? Re comparison we are in process of assembling the language models in one repository to ensure reproductability. https://github.com/n-waves/ulmfit-multilingual Do you want to contribute your lm and hyper Parmas?
Folks, would anyone know if one can use a language model (instead of word vecs) for sequence 2 sequence translation? Think Jeremy mentioned that in previous deep learnng part II in lesson 11 where he demoed translation wird word vecs.
Not sure I got this correct and its possible, pointers welcome.