Removing any train information from a text data bunch during inference time

Hi All,

My first fast ai forum post !

For NLP applications, when we create a data bunch for the language model data_lm, or a data bunch for the classification problem data_clas, we are able to get the original data used for training from the data bunch (data_lm.train_ds[0][0] prints the first data element used for the language model, for example)

I have a need where I am not allowed to save the training data in any way in the “model/data bunch” during inference except for the model parameters

Is that possible ? (I’m actually not sure why the training data is carried along this way , is it used anywhere downstream after the model has been trained ? )

In other words, when I load a trained classifier during inference, I have to apply the following steps -

data_bunch = TextClasDataBunch.load(‘text_data_bunch_path’)
classifier = text_classifier_learner(data_bunch,drop_mult = 0.5)
classifier.load(‘trained_classifer_path’)
classifier.predict(x)

In the data_bunch step, I am loading the data bunch, which means I can view the training data used to build the model. Can I circumvent this step ?

Thanks a lot for the help !
Krishna

1 Like

You want to export your model using learn.export() and use load_learner to bring it back to use it in production.

1 Like

Thanks ! I tried this approach, but when I load the learner object after exporting using load_learner
learner.export()
learner = load_learner(path);

and try learner.predict(x); where x is a string, I get an error message with the following trace, whereas the same command works fine before exporting…

File “python3.7/site-packages/fastai/basic_train.py”, line 375, in predict
y = ds.y.reconstruct(pred, x) if has_arg(ds.y.reconstruct, ‘x’) else ds.y.reconstruct(pred)
File “python3.7/site-packages/fastai/data_block.py”, line 384, in reconstruct
return Category(t, self.classes[t])
TypeError: ‘NoneType’ object is not subscriptable

Thanks a lot for the help !