I have used the approach in lesson3-imdb.ipynb
to fine tune a language model and build a multi class text classifier.
The process seems to have worked but I am struggling to evaluate the output against other models I have tried in the past (such as using TF-IDF).
My classes are imbalanced so I would typically have looked at precision, recall and f1-score for each class (such as is returned by sklearn’s metrics.classification_report
).
I would also like to generate the confusion matrix of actual vs. predicted classes. For this I can use the following but am unclear on how to recover the label names from their numeric values:
preds, targets = learn_clas.get_preds()
predictions = np.argmax(preds, axis = 1)
pd.crosstab(predictions, targets)
Any advice would be appreciated.