At the end of the notebook of the course, there is no explanation about how to use the model for inference.
Therefore, I copy-pasted the code from the following file :
But the prediction is slow : on my AWS instance, with GPU, it takes around 3 seconds, depending on the size of the sentence.
I noticed that the code from the file above does not run on the GPU : the command nvidia-smi shows that the GPU is not used.
So I tried to modify the code a little bit, by using the function to_gpu(), but it gives an error at execution (here I surrounded it by stars, so you would see it in evidence) :
def load_model(itos_filename, classifier_filename, num_classes):
“”"Load the classifier and int to string mapping
Args:
itos_filename (str): The filename of the int to string mapping file (usually called itos.pkl)
classifier_filename (str): The filename of the trained classifier
Returns:
string to int mapping, trained classifer model
"""
# load the int to string mapping file
itos = pickle.load(Path(itos_filename).open('rb'))
# turn it into a string to int mapping (which is what we need)
stoi = collections.defaultdict(lambda:0, {str(v):int(k) for k,v in enumerate(itos)})
# these parameters aren't used, but this is the easiest way to get a model
bptt,em_sz,nh,nl = 70,400,1150,3
dps = np.array([0.4,0.5,0.05,0.3,0.4])*0.5
vs = len(itos)
model = get_rnn_classifier(bptt, 20*70, num_classes, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, num_classes], drops=[dps[4], 0.1],
dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
# load the trained classifier
model.load_state_dict(torch.load(classifier_filename, map_location=lambda storage, loc: storage))
# put the classifier into evaluation mode
model.reset()
model.eval()
model = **to_gpu**(model)
return stoi, model
I also corrected a little bit that one, for more accurate results :
def predict_text(stoi, model, text):
“”“Do the actual prediction on the text using the
model and mapping files passed
“””
# prefix text with tokens:
# xbos: beginning of sentence
# xfld 1: we are using a single field here
input_str = 'xbos xfld 1 ' + **fixup**(text)
# predictions are done on arrays of input.
# We only have a single input, so turn it into a 1x1 array
texts = [input_str]
# tokenize using the fastai wrapper around spacy
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
# turn into integers for each word
encoded = [stoi[p] for p in tok[0]]
# we want a [x,1] array where x is the number
# of words inputted (including the prefix tokens)
ary = np.reshape(np.array(encoded),(-1,1))
# turn this array into a tensor
tensor = torch.from_numpy(ary)
# wrap in a torch Variable
variable = Variable(tensor)
# do the predictions
predictions = model(variable)
# convert back to numpy
numpy_preds = predictions[0].data.numpy()
return softmax(numpy_preds[0])[0]
I load the model that way (no error here, it works perfectly) :
my_stoi, my_model = load_model(LM_PATH/‘tmp’/‘itos.pkl’, PATH/‘models’/‘clas_2.h5’, 2)
But when I use it for inference :
predict_text(my_stoi, my_model, “More of a character study then a movie” )
I get the following error :
TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)
I don’t have that error when using the unmodified version of the functions above.
But I would like to use the GPU, because 3 seconds is way too slow for production.
I tried the following variant :
def predict_text(stoi, model, text):
“”“Do the actual prediction on the text using the
model and mapping files passed
“””
# prefix text with tokens:
# xbos: beginning of sentence
# xfld 1: we are using a single field here
input_str = 'xbos xfld 1 ' + fixup(text)
# predictions are done on arrays of input.
# We only have a single input, so turn it into a 1x1 array
texts = [input_str]
# tokenize using the fastai wrapper around spacy
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
# turn into integers for each word
encoded = [stoi[p] for p in tok[0]]
# we want a [x,1] array where x is the number
# of words inputted (including the prefix tokens)
ary = np.reshape(np.array(encoded),(-1,1))
# turn this array into a tensor
tensor = torch.from_numpy(ary)
# wrap in a torch Variable
variable = **to_gpu**(Variable(tensor))
# do the predictions
predictions = model(variable)
# convert back to numpy
numpy_preds = predictions[0].data.numpy()
return softmax(numpy_preds[0])[0]
But the error changed :
KeyError: ‘torch.FloatTensor’