finetune LM and experimenting on ruSentEval task (Анализ тональности) with different sizes of data samples and achieved ~ .98 F1 score (which is probably beat the benchmark), but only positive vs negative (data from http://study.mokoron.com/)
work on “Rusentiment” classification tas, which is multiclass and noisier than the previous one. I’ve performed some experiments and managed to replicate SOTA (~0.73 F1 score)
Great!
It will be interesting to see your results on SentiRuEval-2016. I also trained a Russian language model on Wikipedia and tried to beat state of the art on it, but did not succeed.
Sorry, my previous message might be a bit confusing. I understand that you did experiments already. I meant that there might be some bugs in my code and it would be great if someone take a look on it)))
I’m a bit newbie in all that. But according to Jeremy, given the default loss function for training language model, we can roughly compute perplexity with exp(valid_loss). If it’s correct I achieved perplexity ~28 for wiki language model and ~62 on finetuning of LM.
Now I’m working on fine-tuning LM with the much bigger dataset (near 2 millions of tweets). Hope it would be better.
That is fine. I am working on a language model based on news media, and I achieve about 22.4. But newspaper language is more restricted and predictable.
@noisefield, do you have some results for the News Classification and a previous benchmark? @ademyanchuk have you finished with the classification?
Fyi, we’ve tested ULMFIT + sentencepiece (30k vocab) on russian MLDoc and we have qutie encouraging results (better than Laser and previous baseline for MLDoc)
Hi! Could you please provide links to the tasks? I will be happy to try them out. As for now, I use it for some personal tasks (and quite happy with the results).
EDIT: If you mean MLDoc, I can do that by the end of the week
Alexey, I’m a bit in a rush today would you be so kind and make the table showing your results vs the previous STOA. It would be awesome if you would annotate what tokenziation you have used .
Here is an excellent example: