Multilingual ULMFiT

I’ve pretrained a language model for Japanese at our company and would like to contribute it to the model zoo.
However, the official zoo is not open yet and I haven’t been able to figure out what to do.
@piotr.czapla Can you kindly point me to what I should do next?

Here’s what I’ve done so far:

  1. Cloned @piotr.czapla’s ulmfit-multilingual repo (https://github.com/n-waves/ulmfit-multilingual/projects/1)
  2. Created a local branch
  3. Refactored the code to use sentencepiece tokenizer instead of Moses tokenizer (on Japanese Wikipedia) before lm pretraining
  4. Pretrained the language model on Japanese Wikipedia
  5. Fine-tuned + classified MedWeb (medical tweets) and Aozora-bunko (license-free books) datasets
1 Like