How to use a text classifier in production fast.ai v2

sebastianzeki · April 13, 2021, 2:55pm

Hi all. I am trying to see if I can use a model I’ve trained on a column of a dataframe. The idea is that I’d append the predicted classification from the model to the dataframe where the text originated (so each row would have the raw text with the prediction).

So far I’ve done this:
learner = load_learner(’/MyPathBarrettsModel.pkl’, “BarrettsModel.pkl”)
df_loadNew = pd.read_csv(’/MyOriginaalText.csv’)
df_loadNew=df_loadNew.iloc[:,:1]
df_loadNew.columns = [‘text’]
real_dl=learner.dls.test_dl(df_loadNew[‘text’])

The part I’m getting stuck at is here:
name,pred_class =learner.get_preds(dl=real_dl)

How do I convert the pred_class to a classification label and append to the original dataframe?

muellerzr · April 13, 2021, 2:59pm

Not 100% sure if it’s an index yet or not. If not:

preds = learner.get_preds(dl=real_dl)
preds_max = preds.argmax(dim=1)
names = [learner.dls.vocab[p] for p in preds_max]

preds_max should be a tensor of integers representing class indexs, which we get through argmaxing

sebastianzeki · April 13, 2021, 3:40pm

This is my pred_class: Forgive my naivety but how to index pred_class?

(tensor([[3.0557e-07, 1.0300e-06, 7.3461e-10, 5.9118e-05, 9.9994e-01],
…,
[4.0250e-04, 2.0067e-01, 1.6447e-06, 5.4147e-02, 7.4477e-01]]), None)

muellerzr · April 13, 2021, 3:41pm

Alright so from here we do:

preds, = my_tensors_you_showed
pred_class = preds.argmax(1)

and then follow it from there. What you have right there are the raw probabilities

sebastianzeki · April 14, 2021, 12:16pm

Thanks for replying and my apologies for still being a little confused. Im not sure what the my_tensors_you_showed refers to . So the whole code should be like this?..

        learner = load_learner(’/MyPathBarrettsModel.pkl’, “BarrettsModel.pkl”)
        df_loadNew = pd.read_csv(’/MyOriginaalText.csv’)
        df_loadNew=df_loadNew.iloc[:,:1]
        df_loadNew.columns = [‘text’]
        real_dl=learner.dls.test_dl(df_loadNew['text'])
        name,pred_class =learner.get_preds(dl=real_dl)
        preds_max = name.argmax(dim=1)
        names = [learner.dls.vocab[p] for p in preds_max]

I get the error AttributeError: 'tuple' object has no attribute 'argmax'

I’m undoubtedly making a basic error but I’m just trying to get up and running so I can mess around with the code later. Where am I going wrong?

sebastianzeki · April 14, 2021, 12:41pm

OK so I think the problem was that
names = [learner.dls.vocab[p] for p in preds_class] should be
names = [learner.dls.vocab[1][p] for p in preds_class]

learner.dls.vocab[1] is a category map with the categories in it.

Thanks for the help.