How to use a text classifier in production fast.ai v2

Hi all. I am trying to see if I can use a model I’ve trained on a column of a dataframe. The idea is that I’d append the predicted classification from the model to the dataframe where the text originated (so each row would have the raw text with the prediction).

So far I’ve done this:
learner = load_learner(’/MyPathBarrettsModel.pkl’, “BarrettsModel.pkl”)
df_loadNew = pd.read_csv(’/MyOriginaalText.csv’)
df_loadNew=df_loadNew.iloc[:,:1]
df_loadNew.columns = [‘text’]
real_dl=learner.dls.test_dl(df_loadNew[‘text’])

The part I’m getting stuck at is here:
name,pred_class =learner.get_preds(dl=real_dl)

How do I convert the pred_class to a classification label and append to the original dataframe?

Not 100% sure if it’s an index yet or not. If not:

preds = learner.get_preds(dl=real_dl)
preds_max = preds.argmax(dim=1)
names = [learner.dls.vocab[p] for p in preds_max]

preds_max should be a tensor of integers representing class indexs, which we get through argmaxing

This is my pred_class: Forgive my naivety but how to index pred_class?

(tensor([[3.0557e-07, 1.0300e-06, 7.3461e-10, 5.9118e-05, 9.9994e-01],
…,
[4.0250e-04, 2.0067e-01, 1.6447e-06, 5.4147e-02, 7.4477e-01]]), None)

Alright so from here we do:

preds, = my_tensors_you_showed
pred_class = preds.argmax(1)

and then follow it from there. What you have right there are the raw probabilities

Thanks for replying and my apologies for still being a little confused. Im not sure what the my_tensors_you_showed refers to . So the whole code should be like this?..

        learner = load_learner(’/MyPathBarrettsModel.pkl’, “BarrettsModel.pkl”)
        df_loadNew = pd.read_csv(’/MyOriginaalText.csv’)
        df_loadNew=df_loadNew.iloc[:,:1]
        df_loadNew.columns = [‘text’]
        real_dl=learner.dls.test_dl(df_loadNew['text'])
        name,pred_class =learner.get_preds(dl=real_dl)
        preds_max = name.argmax(dim=1)
        names = [learner.dls.vocab[p] for p in preds_max]

I get the error AttributeError: 'tuple' object has no attribute 'argmax'

I’m undoubtedly making a basic error but I’m just trying to get up and running so I can mess around with the code later. Where am I going wrong?

OK so I think the problem was that
names = [learner.dls.vocab[p] for p in preds_class] should be
names = [learner.dls.vocab[1][p] for p in preds_class]

learner.dls.vocab[1] is a category map with the categories in it.

Thanks for the help.

1 Like