Model.get_preds(): TypeError after preds are created

tasunoro · August 21, 2022, 9:25pm

Hey there!

I hope I am posting this in the right subsection since its my first time actually writing to and not just reading this forum. I am training a text classifier and saved my results afterwards like so:

learn_cl.save("learned_cl_wb")

Then I re-init the model and load the trained instance:

model = learn_cl.load('learned_cl_wb')

I can now predict single items with the predict method, which works very well already:

model.predict("bla bla")

But now I would like to predict 30 000 sentences, and it would take a very long time to do it for each instance. If I understand it correctly, I am supposed to use the get_preds() method for these cases. But there, I get an error. My code looks like this:

df = pd.read_csv('test_data.csv', low_memory=False)
dl = DataBlock(
    blocks=(TextBlock.from_df('text', vocab=dls.vocab)),
    get_x=ColReader('text')
).dataloaders(df, bs=128, seq_len=72)
preds = model.get_preds(dl=dl)

The dataloader is excatly the same that I used to train the classifier in the first place, just without the get_y. I can also see that the predictions are actually made (the progress bar runs through my batches), but then an error happens, and I cant figure out why. Maybe one of you can help me? Here is the error log:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_874/1175197425.py in <module>
      1 #dl = model.dls.test_dl(df)
----> 2 preds = model.get_preds(dl=dl)
      3 preds

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, reorder, cbs, **kwargs)
    295                 res[pred_i] = act(res[pred_i])
    296                 if with_decoded: res.insert(pred_i+2, getcallable(self.loss_func, 'decodes')(res[pred_i]))
--> 297             if reorder and hasattr(dl, 'get_idxs'): res = nested_reorder(res, tensor(idxs).argsort())
    298             return tuple(res)
    299         self._end_cleanup()

/opt/conda/lib/python3.7/site-packages/fastai/torch_core.py in tensor(x, *rest, **kwargs)
    150            else as_tensor(x.values, **kwargs) if isinstance(x, (pd.Series, pd.DataFrame))
    151 #            else as_tensor(array(x, **kwargs)) if hasattr(x, '__array__') or is_iter(x)
--> 152            else _array2tensor(array(x), **kwargs))
    153     if res.dtype is torch.float64: return res.float()
    154     return res

/opt/conda/lib/python3.7/site-packages/fastai/torch_core.py in _array2tensor(x)
    136     if sys.platform == "win32":
    137         if x.dtype==np.int: x = x.astype(np.int64)
--> 138     return torch.from_numpy(x)
    139 
    140 # %% ../nbs/00_torch_core.ipynb 40

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Thank you in advance!