So as the title says, I think my model gave back great results on the development set with:
Accuracy = 0.9230769230769231 ,
Confusion Matrix =
[[ 527 17 1 35]
[ 23 499 0 13]
[ 1 1 372 32]
[ 41 28 40 1386]]
F-Score: 0.9232024887594127
precision recall f1-score support
0 0.89 0.91 0.90 580
1 0.92 0.93 0.92 535
2 0.90 0.92 0.91 406
3 0.95 0.93 0.94 1495
avg / total 0.92 0.92 0.92 3016
But when I use the learn.predict(True) to get predictions for the test set and check the classification it’s absolutely useless. I already checked if the order of the predictions and the actual text is correct. I’m using a standart 10x test to validation ratio.
Any ideas why this is? Is there any way to get a better result?
The test set has a ratio of 4% 4% 4% 88% while the test & dev had a ratio of 18% 18% 18% 46%, could it have something to do with those ratios?
Thanks in advance