Exporting Tabular Data Loaders

AJAYHAYAGREEVE · December 4, 2020, 3:53pm

Hi,
I trained a tabular model and exported its weights. I have both categorical and continuous variables. When I want to predict the model it is asking for dls (dataloader). But I cannot export my tabular data loader. So I trained the following, I took all the categorical variables and got the unique attributes for each of those variables. Then I created a new dataframe (just with max number of categorical vaues and 0 for continuous values) and then made a tabular data loader for that dataset. To predict, I gave this

test_dl = dls.test_dl(valid_data, with_labels = True)
preds = learn.get_preds(dl = test_dl)
print(metrics.roc_auc_score(valid_data[‘target’], preds[0][:,0]), len(preds[0][:,0]))

But the score does not match for the same validation dataset. So I think the encoding for categorical variables are different. Could someone help me in this ? i.e. Is there a way either to save dataloader or making sure that the same encoding appears ?

deep_derping · December 4, 2020, 7:00pm

You can either export only the weight with learn.save() or the whole learner with learn.export(). The latter will be saved with all transforms, so when you load it again, your encoding should be unchanged.

See chapter 2 of the Book:

Using the Model for Inference

Once you’ve got a model you’re happy with, you need to save it, so that you can then copy it over to a server where you’ll use it in production. Remember that a model consists of two parts: the architecture and the trained parameters. The easiest way to save the model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the export method.

This method even saves the definition of how to create your DataLoaders. This is important, because otherwise you would have to redefine how to transform your data in order to use your model in production. fastai automatically uses your validation set DataLoader for inference by default, so your data augmentation will not be applied, which is generally what you want.

When you call export, fastai will save a file called “export.pkl”:

AJAYHAYAGREEVE · December 4, 2020, 7:27pm

yeah. Thats what I did. when I just loaded by calling load_learner the saved model, it asked for dls. So I created a sample data and from it, the dls and then assigned learn.dls = dls. To predict on new data, I used the code I mentioned above. The score is not the same for the models (i.e for the original )and then the loaded one.

Got it. Thanks. I just needed to do
test_dl = learn.dls.test_dl(valid_data, with_labels = True)

And one more request. I also want to do incremental training. Basically training only on new batch of data.

muellerzr · December 4, 2020, 7:34pm

You need to export your dataloaders with it, otherwise you won’t be able to generate your test data based on your training data. Is there a particular reason you can’t? Your original data is never saved no matter what kind of dataloader you’re working with, all it saves is the template to generate new data

AJAYHAYAGREEVE · December 4, 2020, 7:41pm

Yeah. Got it. When I gave

test_dl = learn.dls.test_dl(valid_data, with_labels = True)
preds = learn.get_preds(dl = test_dl)
print(metrics.roc_auc_score(valid_data[‘target’], preds[0][:,0]), len(preds[0][:,0]))

it worked
Thanks,

And one more request. I also want to do incremental training. Basically training only on new batch of data.