ULMFit Inference with GPU

At the end of the notebook of the course, there is no explanation about how to use the model for inference.
Therefore, I copy-pasted the code from the following file :

But the prediction is slow : on my AWS instance, with GPU, it takes around 3 seconds, depending on the size of the sentence.
I noticed that the code from the file above does not run on the GPU : the command nvidia-smi shows that the GPU is not used.

So I tried to modify the code a little bit, by using the function to_gpu(), but it gives an error at execution (here I surrounded it by stars, so you would see it in evidence) :

def load_model(itos_filename, classifier_filename, num_classes):
“”"Load the classifier and int to string mapping

Args:
    itos_filename (str): The filename of the int to string mapping file (usually called itos.pkl)
    classifier_filename (str): The filename of the trained classifier

Returns:
    string to int mapping, trained classifer model
"""

# load the int to string mapping file
itos = pickle.load(Path(itos_filename).open('rb'))
# turn it into a string to int mapping (which is what we need)
stoi = collections.defaultdict(lambda:0, {str(v):int(k) for k,v in enumerate(itos)})

# these parameters aren't used, but this is the easiest way to get a model
bptt,em_sz,nh,nl = 70,400,1150,3
dps = np.array([0.4,0.5,0.05,0.3,0.4])*0.5
vs = len(itos)

model = get_rnn_classifier(bptt, 20*70, num_classes, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
        layers=[em_sz*3, 50, num_classes], drops=[dps[4], 0.1],
        dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])

# load the trained classifier
model.load_state_dict(torch.load(classifier_filename, map_location=lambda storage, loc: storage))

# put the classifier into evaluation mode
model.reset()
model.eval()

model = **to_gpu**(model)

return stoi, model

I also corrected a little bit that one, for more accurate results :

def predict_text(stoi, model, text):
“”“Do the actual prediction on the text using the
model and mapping files passed
“””

# prefix text with tokens:
#   xbos: beginning of sentence
#   xfld 1: we are using a single field here
input_str = 'xbos xfld 1 ' + **fixup**(text)

# predictions are done on arrays of input.
# We only have a single input, so turn it into a 1x1 array
texts = [input_str]

# tokenize using the fastai wrapper around spacy
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))

# turn into integers for each word
encoded = [stoi[p] for p in tok[0]]

# we want a [x,1] array where x is the number
#  of words inputted (including the prefix tokens)
ary = np.reshape(np.array(encoded),(-1,1))

# turn this array into a tensor
tensor = torch.from_numpy(ary)

# wrap in a torch Variable
variable = Variable(tensor)

# do the predictions
predictions = model(variable)

# convert back to numpy
numpy_preds = predictions[0].data.numpy()

return softmax(numpy_preds[0])[0]

I load the model that way (no error here, it works perfectly) :
my_stoi, my_model = load_model(LM_PATH/‘tmp’/‘itos.pkl’, PATH/‘models’/‘clas_2.h5’, 2)

But when I use it for inference :
predict_text(my_stoi, my_model, “More of a character study then a movie” )

I get the following error :

TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

I don’t have that error when using the unmodified version of the functions above.
But I would like to use the GPU, because 3 seconds is way too slow for production.

I tried the following variant :

def predict_text(stoi, model, text):
“”“Do the actual prediction on the text using the
model and mapping files passed
“””

# prefix text with tokens:
#   xbos: beginning of sentence
#   xfld 1: we are using a single field here
input_str = 'xbos xfld 1 ' + fixup(text)

# predictions are done on arrays of input.
# We only have a single input, so turn it into a 1x1 array
texts = [input_str]

# tokenize using the fastai wrapper around spacy
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))

# turn into integers for each word
encoded = [stoi[p] for p in tok[0]]

# we want a [x,1] array where x is the number
#  of words inputted (including the prefix tokens)
ary = np.reshape(np.array(encoded),(-1,1))

# turn this array into a tensor
tensor = torch.from_numpy(ary)

# wrap in a torch Variable
variable = **to_gpu**(Variable(tensor))

# do the predictions
predictions = model(variable)

# convert back to numpy
numpy_preds = predictions[0].data.numpy()

return softmax(numpy_preds[0])[0]

But the error changed :
KeyError: ‘torch.FloatTensor’

i assume that ary is feed into the neural network so maybe this will work:
tensor = torch.from_numpy(ary.astype(np.float32))

It does not work, but interestingly, the error changed from :

TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

To:

TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.FloatTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

As a conclusion, you are right, the problem comes from the type of the variable ary.
I tried all the combinations I could think of, in order to convert it to a cuda type, but none worked:

tensor = torch.from_numpy(ary)>
variable = to_gpu(Variable(tensor))

KeyError: ‘torch.FloatTensor’

tensor = to_gpu(torch.from_numpy(ary))
variable = Variable(tensor)

KeyError: ‘torch.FloatTensor’

tensor = torch.from_numpy(to_gpu(ary))
variable = Variable(tensor)

AttributeError: ‘numpy.ndarray’ object has no attribute ‘cuda’

Look like your application expect a LongTensor. Then you could do.
could you also print ary.shape and ary.dtype to see how ary is laid out

tensor = torch.from_numpy(ary.astype(np.int32)).long().cuda()
variable = Variable(tensor)

Still the same problem:

KeyError: ‘torch.FloatTensor’

I printed the following data types:

print(ary.dtype)
int64

print(type(torch.from_numpy(ary)))
class ‘torch.LongTensor’

print(type(to_gpu(torch.from_numpy(ary))))
class ‘torch.cuda.LongTensor’

print(type(torch.from_numpy(ary.astype(np.int32))))
class ‘torch.IntTensor’

print(type(torch.from_numpy(ary.astype(np.int32)).long()))
class ‘torch.LongTensor’

print(type(torch.from_numpy(ary.astype(np.int32)).long().cuda()))
class 'torch.cuda.LongTensor

As the fast.ai library seems to require a torch.cuda.LongTensor, passing in “to_gpu(torch.from_numpy(ary))” or “torch.from_numpy(ary.astype(np.int32)).long().cuda()” seems allright.
But I don’t understand how to solve that “KeyError: ‘torch.FloatTensor’” error, since the message is rather cryptic.

Here is the stacktrace, if it helps:

   # do the predictions

—> predictions = model(variable)

   # convert back to numpy

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
–> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
65 def forward(self, input):
66 for module in self._modules.values():
—> 67 input = module(input)
68 return input
69

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
–> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)

~/fastai/courses/dl2/fastai/lm_rnn.py in forward(self, input)
139 raw_outputs, outputs = [],[]
140 for i in range(0, sl, self.bptt):
–> 141 r, o = super().forward(input[i: min(i+self.bptt, sl)])
142 if i>(sl-self.max_seq):
143 raw_outputs.append®

~/fastai/courses/dl2/fastai/lm_rnn.py in forward(self, input)
104 with warnings.catch_warnings():
105 warnings.simplefilter(“ignore”)
–> 106 raw_output, new_h = rnn(raw_output, self.hidden[l])
107 new_hidden.append(new_h)
108 raw_outputs.append(raw_output)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
–> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)

~/fastai/courses/dl2/fastai/rnn_reg.py in forward(self, *args)
122 “”"
123 self._setweights()
–> 124 return self.module.forward(*args)
125
126 class EmbeddingDropout(nn.Module):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
202 flat_weight=flat_weight
203 )
–> 204 output, hidden = func(input, self.all_weights, hx)
205 if is_packed:
206 output = PackedSequence(output, batch_sizes)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)
383 return hack_onnx_rnn((input,) + fargs, output, args, kwargs)
384 else:
–> 385 return func(input, *fargs, **fkwargs)
386
387 return forward

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py in _do_forward(self, *input)
326 self._nested_input = input
327 flat_input = tuple(_iter_variables(input))
–> 328 flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
329 nested_output = self._nested_output
330 nested_variables = _unflatten(flat_output, self._nested_output)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py in forward(self, *args)
348 def forward(self, *args):
349 nested_tensors = _map_variable_tensor(self._nested_input)
–> 350 result = self.forward_extended(*nested_tensors)
351 del self._nested_input
352 self._nested_output = result

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in forward_extended(self, input, weight, hx)
292 hy = tuple(h.new() for h in hx)
293
–> 294 cudnn.rnn.forward(self, input, hx, weight, output, hy)
295
296 self.save_for_backward(input, hx, weight, output)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py in forward(fn, input, hx, weight, output, hy)
233 fn.x_descs = cudnn.descriptor(x[0], fn.seq_length)
234 fn.y_descs = cudnn.descriptor(y[0], fn.seq_length)
–> 235 fn.hx_desc = cudnn.descriptor(hx)
236 fn.hy_desc = cudnn.descriptor(hx)
237 fn.cx_desc = cudnn.descriptor(cx) if cx is not None else None

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/backends/cudnn/init.py in descriptor(tensor, N)
336 else:
337 descriptor = TensorDescriptor()
–> 338 descriptor.set(tensor)
339 return descriptor
340

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/backends/cudnn/init.py in set(self, tensor)
137 self._stride = tensor.stride()
138 check_error(lib.cudnnSetTensorNdDescriptor(
–> 139 self, _typemap[tensor.type()], tensor.dim(),
140 int_array(tensor.size()), int_array(tensor.stride())))
141

KeyError: ‘torch.FloatTensor’