Prediction is too slow in pytorch

I am trying to do text classification using fastai.

I created model in GPU using fastai.
I tried to predict using CPU. For single prediction it take around 600 ms which I think it is too high.
Could anybody guide me to how to move models to CPU.

Note: learn.predict predicted output in 90 ms. So, My approach is almost take 7 times more than that.

Youā€™ll probably get a better response if you post your code otherwise weā€™re just left guessing at what you may have done or not done. With that being said, one possibility is that you didnā€™t put your model into eval mode. Try model.eval() before doing your prediction.

Sorry for that,

Here is my code.

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')


map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None

def load_model(tokenizer_path, model_path, num_classes):

# these parameters aren't used, but this is the easiest way to get a model
bptt, em_sz, nh, nl = 70, 400, 1150, 3
drop_out = np.array([0.4, 0.5, 0.05, 0.3, 0.4]) * 0.5
drop_mult = 0.5
dps = drop_out * drop_mult
ps = [0.1]
ps = [dps[4]] + ps

file = open(tokenizer_path, "rb")
tokenizer = pickle.load(file)
# turn it into a string to int mapping.
stoi = collections.defaultdict(lambda: 0, {str(v): int(k) for k, v in enumerate(tokenizer)})

lin_ftrs = [50]
layer = [em_sz * 3] + lin_ftrs + [num_classes]

vs = len(tokenizer)

model = get_rnn_classifier(bptt, 20 * 70, num_classes, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
                           layers=layer, drops=ps, weight_p=dps[1], embed_p=dps[2], hidden_p=dps[3])

model.load_state_dict(torch.load(model_path, map_location=map_location))

model.to(device)
model.eval()
return stoi, model
stoi, model = load_model(os.path.join(tok_path "itos.pkl"),
                                                  os.path.join(models_path_sent, "model.pth"), 3)

def predict( padded_sentence):

# do the predictions
encoded = np.transpose(np.array([[stoi[o] for o in p] for p in padded_sentence]))
t = torch.from_numpy(encoded).to(device)
variable = Variable(t)

predictions, *_ = model(variable)
scores = [[soft_max(m.data.numpy())[0]] for m in predictions.cpu()]
classes = ['One', 'Two', 'Three']
result = [classes[np.argmax(res)] for res in scores]
return result

My input to the function is

messages = ["I don't see the difference between these bodysuits and the more expensive ones.  Fits my boy just right",
           "Very nice basic clothing.  I think the size is fine.  I really like being able to find these shades of green, though I have decided the lighter shade is really a feminine color.  This is the only brand that I can find these muted greens",
           "I love these socks. They fit great (my 15 month old daughter has thick ankles) and she can zoom around on the kitchen floor and not take a nose dive into things.",
           "These shoes are very comfortable and apparently well-made. My single quibble with the black pair I got here on amazon has to do with the flimsiness of the tongue. I like a nice strong tongue. (And who doesn't?) Whatever they saved on material here may have been a mistake. On the other hand, maybe the sole will wear out first (or it will be a dead heat)."]
messages = [message.lower() for message in messages]
tok = [message.split() for message in messages]
max_len = max([len(s) for s in tok])
padded_sentence = [pad(t, max_len - len(t))
if len(t) < max_len else ['xxbos', 'xxfld', '1'] + t for t in tok]
print(predict(messages)

If I use the CPU to predict above message it take 40 times more than GPU prediction time.

If you have an older version of pytorch these do not include MKLDNN so are not optimised for CPU.

I created a sentiment analysis model in fastai and predictions with it take 100 times longer than a prediction with TextBlob (prediction on a short sentence with fastai take around 0.1 s), so Iā€™m also looking for ways to optimize it. However, right now Iā€™m loading the model with fastai, not Pytorch. What is the reason to use Pytorch instead of fastai to load the model? Also, I donā€™t entirely understand your code @ajan1019, so is there any thread/article you could point me to that explains loading text fastai models in pytorch? I did some googling, but without success so far. Below is my code, maybe Iā€™m doing sth wrong.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.text import *
import pandas as pd
from textblob import TextBlob

bs=48

data_lm = load_data(ā€™ā€™, ā€˜data_lm.pklā€™, bs=bs)

data_sent = load_data(ā€™ā€™, ā€˜data_clas.pklā€™, bs=bs)

learn = text_classifier_learner(data_sent, AWD_LSTM, drop_mult=0.5)
learn.load(ā€˜firstā€™);

def compute_polarity(sentence):
return TextBlob(sentence).sentiment.polarity

#Examples:

%timeit learn.predict(ā€œI really loved that sushiā€)
#output: 53.4 ms Ā± 455 Āµs per loop (mean Ā± std. dev. of 7 runs, 10 loops each)
%timeit compute_polarity(ā€œI really loved that sushiā€)
#output:441 Āµs Ā± 3.37 Āµs per loop (mean Ā± std. dev. of 7 runs, 1000 loops each)

Dear commmunity,

I am would need to add a column in all my data set (Train + Validation) with the prediction for each input.
Here some info + the code:

  • I have around 1800 rows training set + 200 rows validation set

  • itā€™s a tabular model

  • the prediction is a continue value, not a categorical one

      data_prep['Prevision'] = 0
      for index, row in data_prep.iterrows() :
         data_prep.loc[index,'Prevision']= learn.predict(data_prep.iloc[index])[2].numpy()
      data_prep
    

However this code takes way too long (around 5-6 minutes).

Is there a faster way to get all the predictions for each input?

I checked also get_preds, but itā€™s not good for me, because:

  • the preds are sorted based on the batches created, and I instead need to connect the prediction with each input
  • the preds donā€™t have the same size of the input, since some inputs are not used to round to the correct batch size.

Thanks in advace

Make sure your model is using the GPU when running predict. Should help some.

Hello,
have you tried calling get_preds with the following parameters?

`learn.get_preds(ds_type = DatasetType.Fix, ordered = True)`

This should give you predictions on your ā€œoriginalā€ training data set (same ordering and no missing items due to batch size rounding); then you would need to call it for DatasetType.Validation (which is actually the default), too, to have predictions for your whole dataset.

This way you should be able to fill your Prediction column without having to loop through all itemsā€¦ Hope this helps!

Hi Isabella,
WOW your input works perfectly and itā€™s all very fast!! You solved me a problem, thanks a lot!

There is only 1 small detail: there is a very small difference in value between predicts() Vs get_preds(), which is a bit misleading. Maybe predicts() rounds the values?
eg.
From Predicts: 0.022995
From get_preds: 0.022995
Difference: -1.117587e-08

Just for reference here the small function I created.

def add_preds_stat(data, learn, prev_col=ā€˜Previsionā€™,act_col=ā€˜Actualā€™,err_col= ā€˜Errā€™, loss_col=ā€˜Lossā€™):
pred_train = learn.get_preds(ds_type = DatasetType.Fix,with_loss=True)
pred_valid = learn.get_preds(ds_type = DatasetType.Valid,with_loss=True)
data[prev_col] = (torch.cat((pred_train[0],pred_valid[0]),dim=0)).max(1).values
data[act_col] = torch.cat((pred_train[1],pred_valid[1]),dim=0)
data[err_col] = data[prev_col]-data[act_col]
data[loss_col] = torch.cat((pred_train[2],pred_valid[2]),dim=0)

Some notes:

  • data is the full dataset (Train + Value)
  • I used max because I predict a continue value, otherwise argmax will do.

Hi Zachary,
Thanks for your input. Sorry I did not mention that I already enabled GPU, but itā€™s still very slow.
I actually solved it by using get_preds(), but still itā€™s a bit strange why predict() is so slow.
Maybe itā€™s due to the fact that it creates a small 1 size batch for every inputs. Maybe it would be better to allow predicts() accepting a big dataset instead of just 1 value?

The goal of predict is one at a time. If you want more you should use the get_preds() method mentioned. And yes, one at a time is indeed slower than batches of data :slight_smile:

1 Like

I wouldnā€™t worry about the slight difference you are reporting. I guess it is just a numerical precision issue :slight_smile:

1 Like