Google BERT on Kaggle Movie Reviews dataset

Hello, I am using BERT on the Sentiment Analysis on Movie Reviews dataset from a past (4y ago) Kaggle competition (https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews). The dataset has around 150K training examples and a public test set (for public leaderboard) of 67K examples.

After running 2 epochs (took me 3h) I got 0.688 score on the public leader board which is in the top 5 on the public leaderboard (private leaderboard is not available anymore). It seems to work but I will try to tune the learning rate to see if I can get better result.

I would like to share this in case someone want to do the same experiment so that we can compare the results.
Thanks

12 Likes

Hi Canh,
Is it possible to share your code?
I am also planning to do the same task.

Hi @Skeptic,

Sorry for the late response. I will put my code somewhere so that I can share, but basically what I did is to follow the classification examples in run_classifier.py (https://github.com/google-research/bert#sentence-and-sentence-pair-classification-tasks) :

  • Prepare train.tsv, dev.tsv, and test.tsv files with sentence \tab label format (for test.tsv just put 0 for label)
  • Create a sub-class of DataProcessor object for your task (see some other examples in the run_classifier.py file)
  • Finally run the script run_classifier.py with the option --task_name to be your defined task.

Hope that helps.

2 Likes

Hi Canh,
Thanks a lot for the reply.

Hi @Skeptic.
Very interesting what you are doing.
Can you share your code?

That would be great!

Hi @fabsta ,
Sorry for the late reply.
BERT was one of the option I was considering but haven"t done anything on it yet.
If I do complete it then I may (since it is with a company) share it .

hey @canh can you please share your code iā€™m working on same project .
it will be grateful
thanks in advance.