you would have to fill all the hyper params then. It is easier to train the LM with LMHyperParams then load it using
CLSHyperParams.from_lm(dataset_path, base_lm_path). So in your case:
exp = LMHyperParams(dataset_path='data/wiki/ru-100/', qrnn=False, tokenizer='v', lang='ru', name='russian')
exp = CLSHyperParams.from_lm('data/my_class_task_data', 'data/wiki/ru-100/models/v70k/lstm_russian.m', name='russian') # get the exact name of your model from execution of LMHyperParams
This part isn’t ready yet you would have to modify this file: https://github.com/n-waves/ulmfit-multilingual/blob/master/ulmfit/train_clas.py#L102-L107
Can you suggest something and make a PR, I would love to incorporate Russion into the ulmfit-multilingual.
I would based it on the
Russian is a bit like Polish meaning that both languages have rich morphology so I would use sentencepiece tokenizer with 25k tokens.
Here is how I would train russian LM:
python -m ulmfit lm --dataset-path data/wiki/ru-100 --bidir=False --qrnn=False --tokenizer=sp --name 'bs40' - train 20 ---bs=40