Text Classification predictions default to naive "all negative"

Hi, I’m building a text classification model in fastai. The problem is that when the model is complete, all predictions default to all negative and accuracy is about 80%. (The labels have an 80/20 frequency split.)

I can adjust the classification threshold within the nearest percentage integer to get from 0% to 71% sensitivity.

If I were using standard ML methods, I would apply some sort of hybrid sampling technique, but I’m not sure what to do here

Hi, if anyone is coming across the problem in the future, I recommend referring to the following posts on upsampling the smaller class.

Here. Here.

You can also see what I did in my NLP project here.

When you split the data to train and test, make sure to do a stratified split. If you do value_counts on the test data set and look at the ratio or positive to negative labels, it should be representative of your original data set.

Give the above approach a try before you do the up-sampling the minority class approach.

Thanks for the advice. By stratifying my data, my AUC increased from .83 to .85 and the classifier.predict() gave probabilities that made much more sense.

1 Like