Hi, I am trying to train a korean language model with wikipedia korean texts. I am using the multifit structure and the mecab tokenizer. I have went through all possible combinaisons I could think of (using sentence piece tokenizer, fine tuning hyperparemeters) but I always get very bad accuracies near to 0. Does anyone have an idea why ?
Here is my notebook : https://github.com/amaleelhamri/korean_language_model/blob/master/ko_lm.ipynb
Thank you in advance!