Inconsistency in Text Classification with ULMFiT (fastai v1)

prateek_joshi · November 11, 2018, 6:54am

I am trying to implement the tutorial from this page https://docs.fast.ai/text.html on google colab. However, I am not getting similar results.

My colab notebook: https://colab.research.google.com/drive/1140hSsTvyTY22nbHZG340v7ia5JIhWqD

I would appreciate any ideas on how to fix this.

howkhang · November 11, 2018, 9:18am

I don’t have permission to view the notebook after clicking on the link?

prateek_joshi · November 11, 2018, 9:31am

@howkhang Sorry for that. I have updated the link, now you can view the notebook.

howkhang · November 11, 2018, 9:51am

You’re using the small sample URLs.IMDB_SAMPLE to fine tune the language model, which would explain the classifier’s low accuracy.

Instead, you should be using URLs.IMDB which is the full dataset.

prateek_joshi · November 11, 2018, 9:56am

@howkhang thanks for your response. As per the example in this page https://docs.fast.ai/text.html, same URLs.IMDB_SAMPLE data has been used and the accuracy is much higher than what I am getting.

howkhang · November 11, 2018, 10:23am

I made a copy of your notebook and ran it in Google Colab and similarly could not replicate the numbers in the docs page.

Try training the classifier on the full IMDB set instead rather than replicate the numbers on the docs page? I managed to get over 94% accuracy by following the lesson3-imdb notebook.