ULMFiT - Russian

ademyanchuk · January 29, 2019, 4:03am

Hello. I’m working on ULMFiT for Russian language. I forked from GitHub - n-waves/multifit: The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761 and mostly inspired by @piotr.czapla work Multilingual ULMFiT

As for now:

pretrain language model from wiki-dump on 100M
finetune LM and experimenting on ruSentEval task (Анализ тональности) with different sizes of data samples and achieved ~ .98 F1 score (which is probably beat the benchmark), but only positive vs negative (data from http://study.mokoron.com/)
work on “Rusentiment” classification tas, which is multiclass and noisier than the previous one. I’ve performed some experiments and managed to replicate SOTA (~0.73 F1 score)

Benchmark

Type	Model	Dataset	Metric	Value
Language Model	ULMFiT	Russian Wikipedia	Perplexity	27.11
Classification	NN + FastText	Rusentiment	F1-score	0.728
Classification	ULMFiT	Rusentiment	F1-score	0.732

Training was performed with standard fastai tokenization.

My fork is GitHub - ademyanchuk/ulmfit-multilingual: Temporary repository used for collaboration on application of for multiple languages.. It has all readmes from parent repo and my experiments are in experiments folder. This work is on fastai v1. All notebooks are self-explanatory and have some comments. Feel free to ask questions, comment and provide suggestions.

Also, I would like to mention previous work

noisefield · January 29, 2019, 8:33am

Great!
It will be interesting to see your results on SentiRuEval-2016. I also trained a Russian language model on Wikipedia and tried to beat state of the art on it, but did not succeed.

ademyanchuk · January 29, 2019, 9:23am

Go ahead)) It might be I did some silly mistake there and that is why I get so good result. But at least, I couldn’t find any flow in the code myself.

noisefield · January 29, 2019, 9:42am

I mean, I have already conducted the experiment, and it failed I mean this task: https://drive.google.com/drive/folders/0BxlA8wH3PTUfV1F1UTBwVTJPd3c

ademyanchuk · January 29, 2019, 10:04am

Sorry, my previous message might be a bit confusing. I understand that you did experiments already. I meant that there might be some bugs in my code and it would be great if someone take a look on it)))

ademyanchuk · January 29, 2019, 10:12am

Actually, as for now, I only did positive/negative classification from all data which is located

, so I would continue my work and try multiclass as in the original task (that was my mistake - I didn’t understand originfl task, now I see).

noisefield · January 30, 2019, 11:43am

By the way, what perplexity did you manage to achieve?

ademyanchuk · January 30, 2019, 2:50pm

I’m a bit newbie in all that. But according to Jeremy, given the default loss function for training language model, we can roughly compute perplexity with exp(valid_loss). If it’s correct I achieved perplexity ~28 for wiki language model and ~62 on finetuning of LM.
Now I’m working on fine-tuning LM with the much bigger dataset (near 2 millions of tweets). Hope it would be better.

noisefield · January 30, 2019, 3:58pm

That is fine. I am working on a language model based on news media, and I achieve about 22.4. But newspaper language is more restricted and predictable.

You are correct about the way to calculate perplexity (you can see some reference on LM evaluation here: http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture06-rnnlm.pdf, page 41).

ademyanchuk · January 30, 2019, 4:18pm

Thank you) By the way, recently I joined study group (DL in NLP from NLP lab in MIPT) for this course. In case you are interested this is a link to join https://docs.google.com/forms/d/e/1FAIpQLSe_iP5pfx2eKvWOjja_lMNcGZacuAg0d7Q229vxJ_8lFIxZ7A/viewform

piotr.czapla · February 12, 2019, 2:30pm

@noisefield, do you have some results for the News Classification and a previous benchmark?
@ademyanchuk have you finished with the classification?

Fyi, we’ve tested ULMFIT + sentencepiece (30k vocab) on russian MLDoc and we have qutie encouraging results (better than Laser and previous baseline for MLDoc)

noisefield · February 12, 2019, 4:00pm

Hi! Could you please provide links to the tasks? I will be happy to try them out. As for now, I use it for some personal tasks (and quite happy with the results).
EDIT: If you mean MLDoc, I can do that by the end of the week

ademyanchuk · February 13, 2019, 3:09am

@piotr.czapla, I finished with rusentiment and get similar to SOTA result (even a bit better).

piotr.czapla · February 14, 2019, 6:42pm

Alexey, I’m a bit in a rush today would you be so kind and make the table showing your results vs the previous STOA. It would be awesome if you would annotate what tokenziation you have used .
Here is an excellent example:

Thx

ademyanchuk · February 15, 2019, 8:17am

Piotr, I edited thread main post and added benchmark table there.

noisefield · February 22, 2019, 10:28am

Hi!

I have published a language model trained on a newspaper subset of Taiga corpus. You can get it here:

As mentioned previously, it achieves 21.98 perplexity on a 20 million token validation set.

Pak · July 3, 2019, 1:36pm

Does anyone still have active link for Rusentiment dataset? Looks like all mentioned here are dead now

ademyanchuk · July 4, 2019, 9:04am

It seems like due to Vkontakte request suspended it.
Quote: “Access to the data is temporarily suspended due to a request from VKontakte.”

Executorg · October 14, 2019, 1:50pm

Hi.

Sorry, I’m new here.
Is there any way to get your final model to experiment?

ademyanchuk · October 15, 2019, 4:21am

Hi. Unfortunately, no. This final model weights located on the server to which I have no longer access.
Sorry.