Single Prediction with NLP example

Following https://docs.fast.ai/text.html#text I have created a classifier with my own dataset. What is a easiest way to get a prediction from a single string with the classifier?

I prefer to use the pytorch model directly - .model(input)

1 Like

“input” needs to be in vector form (torch.Tensor). Did anyone have clean a way to apply the text transformations from a single string?

learn.model??

Signature: learn.model(*input, **kwargs)
Type: SequentialRNN
String form:
SequentialRNN(
(0): MultiBatchRNNCore(
(encoder): Embedding(12934, 400, padding_idx=1)
(encoder_dp): EmbeddingDropout(
(emb): Embedding(12934, 400, padding_idx=1)
)
(rnns): ModuleList(
(0): WeightDropout(
(module): LSTM(400, 1150)
)
(1): WeightDropout(
(module): LSTM(1150, 1150)
)
(2): WeightDropout(
(module): LSTM(1150, 400)
)
)
(input_dp): RNNDropout()
(hidden_dps): ModuleList(
(0): RNNDropout()
(1): RNNDropout()
(2): RNNDropout()
)
)
(1): PoolingLinearClassifier(
(layers): Sequential(
(0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Dropout(p=0.2)
(2): Linear(in_features=1200, out_features=50, bias=True)
(3): ReLU(inplace)
(4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): Dropout(p=0.1)
(6): Linear(in_features=50, out_features=2, bias=True)
)
)
)
Length: 2
File: ~/anaconda/envs/fastai/lib/python3.6/site-packages/fastai/text/models.py
Source:
class SequentialRNN(nn.Sequential):
“A sequential module that passes the reset call to its children.”
def reset(self):
for c in self.children():
if hasattr(c, ‘reset’): c.reset()

How do you pre-process your input?
For example, do you convert the string to tokens and ids, or anything like that?

I think you are trying to predict for single sentence. I think below code will help you to do that.

Convert your list of string into list of integer by using tokenizer.

encoded = np.array([self.stoi[o] for o in p])
t = torch.from_numpy(encoded)
ary = np.reshape(encoded, (-1, 1))
prediction = model(t)
numpy_preds = prediction[0].data.numpy()
scores = self.softmax(numpy_preds[0])[0]

def softmax(self, x):
‘’’
Softmax on numpy
Source: fastai
‘’’
if x.ndim == 1:
x = x.reshape((1, -1))
max_x = np.max(x, axis=1).reshape((-1, 1))
exp_x = np.exp(x - max_x)
return exp_x / np.sum(exp_x, axis=1).reshape((-1, 1))

Let me know if it doesn’t work for you.

1 Like

I’m having trouble with predictions as well. Unfortunately, the above code doesn’t work for me.

x=learn_class.data.valid_ds[0]
y=torch.from_numpy(np.array(x[0]))
learn_class.model(y)

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-73-2801b4016634> in <module>()
----> 1 learn_class.model(y)

C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
     89     def forward(self, input):
     90         for module in self._modules.values():
---> 91             input = module(input)
     92         return input
     93 

C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\fastai\text\models.py in forward(self, input)
    168 
    169     def forward(self, input:LongTensor)->Tuple[Tensor,Tensor]:
--> 170         sl,bs = input.size()
    171         self.reset()
    172         raw_outputs, outputs = [],[]

ValueError: not enough values to unpack (expected 2, got 1)

Alternatively, I’m also perfectly happy to use get_preds. I’ve thrown data into test_dl in the learner, and it generates predictions for me. However, it appears to be reordered, so I just need to figure out which row of data matches which row in my original file. I couldn’t find ordering stored in the learner object am I just not able to use it?

I have an alternative solution. When you create the TextClasDataBunch, make the first column of the test_df as the index of your source data. When you run get_preds at the end, it will generate two tensors, the first being your predictions, and the second will be the resorted index. See code below:

preds_raw = learn_class.get_preds(is_test=True)
preds = DataFrame(preds_raw[0].tolist())
preds['idx']=preds_raw[1].tolist()
preds = preds.set_index('idx')

Then you can use pd.concat to relate it to your original data.

That’s cool, but ideally instead of having input read from learn_class.data, a new string can be used as an input and preprocess with the saved tokenizer/stoi/etc, just like what you would have to do in production. I will post a working example when I get the time.

Note that the RNNLearner now have predict methods that take a text.

2 Likes

Cool thanks!

I got this error when I called .predict

learn = text_classifier_learner(data_clas)
learn.load_encoder('enc2')
learn.fit(1, 1e-3)
learn.predict("str")

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-f774838615b6> in <module>
----> 1 learn.predict("This article is about the type of website. For other uses, see Wiki (disambiguation).")

~/dev/tmpCheckout/fastai/fastai/text/learner.py in predict(self, text, tokenizer)
    110         self.model.reset()
    111         ds.set_item(ids)
--> 112         res = self.pred_batch()[0]
    113         ds.clear_item()
    114         pred_max = res.argmax()

~/dev/tmpCheckout/fastai/fastai/basic_train.py in pred_batch(self, ds_type, pbar)
    215         nw = dl.num_workers
    216         dl.num_workers = 0
--> 217         preds,_ = self.get_preds(ds_type, with_loss=False, n_batch=1, pbar=pbar)
    218         dl.num_workers = nw
    219         return preds

~/dev/tmpCheckout/fastai/fastai/text/learner.py in get_preds(self, ds_type, with_loss, n_batch, pbar, ordered)
     79             sampler = [i for i in self.dl(ds_type).sampler]
     80             reverse_sampler = np.argsort(sampler)
---> 81             preds[0] = preds[0][reverse_sampler,:] if preds[0].dim() > 1 else preds[0][reverse_sampler]
     82             preds[1] = preds[1][reverse_sampler,:] if preds[1].dim() > 1 else preds[1][reverse_sampler]
     83         return(preds)

RuntimeError: index 156 is out of bounds for dimension 0 with size 64```

Hi!

I am getting the same error when using the new predict method which accepts a text= parameter.

As for the training data and process, I am reproducing the IMDB example on https://github.com/fastai/course-v3/, just trying to test the predict method on unseen sentences.

Did you manage to solve it @bobinfo?

In my case:

RuntimeError: index 200 is out of bounds for dimension 0 with size 16
200: size of the val data (IMDB_sample)
16: bs of the TextDataBunch

cc @sgugger

Thanks for all the great work!

Just fixed it in master.

2 Likes

Thanks for the fix @sgugger, it’s working now.