Evaluation for multi class Text Classifier

I have used the approach in lesson3-imdb.ipynb to fine tune a language model and build a multi class text classifier.

The process seems to have worked but I am struggling to evaluate the output against other models I have tried in the past (such as using TF-IDF).

My classes are imbalanced so I would typically have looked at precision, recall and f1-score for each class (such as is returned by sklearn’s metrics.classification_report).

I would also like to generate the confusion matrix of actual vs. predicted classes. For this I can use the following but am unclear on how to recover the label names from their numeric values:

preds, targets = learn_clas.get_preds()
predictions = np.argmax(preds, axis = 1)
pd.crosstab(predictions, targets)

Any advice would be appreciated.

This might help https://www.kaggle.com/dromosys/fast-ai-news-multi-classify-v2


Thanks, in that example the data has been preprocessed to have integers for labels which I guess I could also do.