Load_learner on text data - None type not subscriptable

While working on text classification, once I create a classifier/learner and export,
When I use load_learner to import the learner, and use it for prediction, I get the following error trace

learner = load_learner(path);
prediction = learner.predict(x) where x is text string

File “python3.7/site-packages/fastai/basic_train.py”, line 375, in predict
y = ds.y.reconstruct(pred, x) if has_arg(ds.y.reconstruct, ‘x’) else ds.y.reconstruct(pred)
File “python3.7/site-packages/fastai/data_block.py”, line 384, in reconstruct
return Category(t, self.classes[t])
TypeError: ‘NoneType’ object is not subscriptable

Can some one please help with this ? I am using Fastai v1.53

You can’t pass a string directly to learner.predict, you need to use the Text class. Besides, it seems your learner.data doesn’t have a class argument for some reason, which raises your error. Don’t know if it comes from your learner construction or from the export/load part.

Oh ! So two things

  1. I am able to use predict with a string before I export the learner object. (ie as soon as I finish training, learner.predict(string) works, but once I export and reload, I get this error)
  2. The way I am exporting is just as in the API. Once I have a working learner object, I just say learner.export() which creates a .pkl file. I load the .pkl file using the load_learner object; but as you say, the loaded .pkl file does not seem to have the class argument

Ok I imagine fastai has taken into account the case where you directly pass a string, which is cool. Apparently, if you’re not using the DataBlock API, learner.export() doesn’t correctly save the data. In that case, the docs advice to recreate your learner at inference first, then use learner.load, instead of directly using load_learner.

Thanks a lot ! Stepping back a bit, I was using learner.load initially and it was working fine. The reason I had to switch to load_learner, was that with learner.load, even during inference time, I could view the entire data I used for training once I loaded the TextClasDataBunch, by using
text_data_bunch = TextClasDataBunch(path)

which is not ok for my work (once I train, I do not want a trace of the training data to be stored in the models during inference time), which was why I was trying to switch to load_learner.
I’ll have a look at using the DataBlockAPI . Thanks again !

1 Like

There may be another workaround that allows you not to reload training data (for instance create an empty dataset the same way you create your initial train set, and feed it to the learner at inference time. You’ll have the self.classes attribute but no need to actually load the data). But using fastai’s API is often easier in the end, as the whole library wasn’t really designed to work well with native pytorch.

Thanks ! Will use the API’s. This may need to be a separate thread, but is it possible to create a test set only data bunch / data block ?
All the API’s talk about passing the test set along with train and validation sets during the data bunch creation, which I’m not able to do, as during training, I have access only to train and validation set. During inference/testing, I have access to strings which I guess I need to convert to test set only data bunches.

Thank you so much !

What you can do to do that is create a train set with your test data and use split_none to avoid splitting it in validation split. You’d also need to create the dataloader manually, because fastai shuffles train set by default.
Another option is to put everything in validation set and have an empty train set. Then every validation method will work normally.

1 Like

Thanks a bunch ! So this is how I got this working -
given a sentence x which I want to run inference on with a pre-trained model,

  1. cast is as a data frame df = pandas.DataFrame({‘text’ : x, ‘labels’ : ‘0’}). ‘labels’ is set to a dummy value
  2. tmplist=TextList.from_df(tra, ‘.’, cols=[‘text’], vocab=vocab) ## where the vocab is something I have already saved, and now load using Vocab.load(path)
  3. Convert tmp list to data bunch using your split_none suggestion
    db = tmplist.split_none().label_from_df(cols = ‘label’).databunch()
  4. db.train_dl = db.train_dl.new(shuffle=False). ## This I believe prevents shuffling of trains data as per How to set shuffle=False of train and val?
  5. classifier = load_learner(path, ‘xyz.pkl’, test = db) ## where xyz.pkl is the saved trained learner object
  6. preds = classifier.get_preds(ds_type=DatasetType.Test,ordered = True) ## ordered = True is important, otherwise the test data gets shuffled
  7. Also , another thing is this doesn’t work on inference for only one sample, because creating the data bunch object drops the last sample by default
    Closing this topic

You can pass drop_last=False to databunch method I believe. Happy you made it work !