Ulmfit performa worse than bidirectional lstm for text classification?

zhh210 · February 27, 2019, 5:05am

I was expecting ulmfit to perform at least as good as plain bidirectional lstm but it didn’t happen on my dataset. To be specific, I was following the text classification tutorial using ULMFIT (https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/) but used on my own data with 1300 sentences and 3 classes (negative, neutral, positive). However, the accuracy I got is only 50% with ULMFIT.

Some obviously weird behaviors:

The classification model doesn’t even perform well on the training data (accuracy 40%), even when I set fit_one_cycle running multiple rounds;
The plain bidirectional LSTM got accuracy of 70%.

Tchotchke · February 27, 2019, 10:40pm

Usually if performance is that far off of your expectations, it’s an indication that you did something wrong (as opposed to an issue with the algorithm itself). I’d recommend starting from the fastai IMDB notebook and see if you get better results. I’ve been using fastai’s implementation of ULMFiT for about 8 months now and have consistently found really good performance, even on challenging problems (here is a link to a blog post detailing one of these experiments)

I took a quick look at the link you referenced and one thing that I find weird is their removal of stop words - that is not something that is done with the modern techniques that use language modeling. Probably even more importantly, I see that they don’t use the learning rate finder (lr_find()) - that will have a BIG impact when using fastai. In my experience, using a sub-optimal learning rate could produce the results that you would see.

If you are getting 70% accuracy with the bidirectional LSTM, I would expect ULMFiT to be in the upper 60s (though on just a 3-class problem, I’m quite surprised that you aren’t seeing better performance)

zhh210 · February 28, 2019, 1:28am

Thank for pointing out! The accuracy does improve to 56% after applying the steps mirroring the IMDB notebook (using the classification training data to fine tune the language model). However, it still is 13% worse than bidirectional LSTM which seems bad. There are many numbers and dates in my data that probably need some special tokenization.

navneetkrch · March 15, 2019, 9:08am

Hey ZHH210,
The tutorial that you are following is going for text preprocessing and cleaning and it ends up distorting the sentences.
You do not need to do those steps and in place of that it is better to just follow steps similar to the IMDB notebook of Jeremy/fastai.

You can go through this Colab Notebook, and you will find that I have better LM (13% vs 30% accuracy) and finally that led to us getting better accuracy(90% vs 95%) as well.
Just remove the preprocessing part and update us as well after that how it impacted your performance.