Hi.
I pretrained a language model for Japanese (including sentencepiece tokenization). > Thank you @piotr.czapla for your code and kind direction.
Details are here.
I’ve used the pretrained model for classification of disease-related tweets (MedWeb dataset) and achieved F1micro = 0.89, which is 0.03 points below SOTA.
I’ll post updates when our repo is ready.