Pre-trained ULMFiT for Twitter data

Aktsvigun · May 30, 2020, 8:45am

Hi everyone,

I am looking for ULMFiT pre-trained on Twitter data, as according to my investigation, for instance, for Russian the one pre-trained on Twitter shows better perfomance than the one pre-trained on Wikipedia if the final goal is to classify sentiment on Twitter dataset. Will be very grateful if someone possesses it.

stefan-ai · May 31, 2020, 8:22am

Hi @Aktsvigun,

I’m not aware of any such pre-trained model. But I would suggest starting with the standard pre-trained Wikitext model and then fine-tuning that language model on a reasonable amount of unlabeled twitter data. You can increase your dataset size in this step by adding data from other twitter datasets (e.g. there should be some available on kaggle). Then in the third step you’ll fine-tune a sentiment classifier only on your specific dataset. That should work quite well.