Multilingual ULMFiT

s.tsuruno · December 5, 2018, 7:02am

I’ve pretrained a language model for Japanese at our company and would like to contribute it to the model zoo.
However, the official zoo is not open yet and I haven’t been able to figure out what to do.
@piotr.czapla Can you kindly point me to what I should do next?

Here’s what I’ve done so far:

Cloned @piotr.czapla’s ulmfit-multilingual repo (https://github.com/n-waves/ulmfit-multilingual/projects/1)
Created a local branch
Refactored the code to use sentencepiece tokenizer instead of Moses tokenizer (on Japanese Wikipedia) before lm pretraining
Pretrained the language model on Japanese Wikipedia
Fine-tuned + classified MedWeb (medical tweets) and Aozora-bunko (license-free books) datasets