Rotten tomatoes Sentiment analysis

Hello,
I’m goint to apply lesson 4 and lesson 10 on this dataset. Did anyone looked into? It has very shifted distribution of data, 5 categories, and hard to diffiritiate phrases.

I’m going to start with simple solution and create lang model from scratch like in lesson 4 imdb. Is there any suggestions about corpus size for the model to be good?

I wrote a little post about it in Kaggle, but it looks like the playground competition is not very popular.

I have one question meanwhile. What is the best loss function to deal with categorical data, when 2 is neutral, 1 is negative and 0 is very negative? How to translate that result 0 is same direction like 1 but stronger?