Tabular inference

Let’s say I have a TabularPandas with procs like [Categorify, FillMissing, Normalize].

Once my model is trained, I need the same Categorify and Normalize settings if I ever want to use this trained model in production. My categorical variables need to be encoded to the same # and my continuous variables need to be normalized with the same means and stds.

Does fastai v2 have a tutorial showing inference once you have a trained model? I know about and Learner.load, but does this handle procs from our DataLoaders to correctly normalize and categorize your data?



Yes. Check out tutorial tabular docs:

(towards the bottom it discusses predict/get_preds)

1 Like

Thanks for the link. What I mean is more around loading back the means and stds stored in Normalize and the dictionaries mapping categories to numbers in Categorize.

Let’s say I have a huge tabular dataset (several GBs), I fit a model to it, it takes a long long time to train.

Now let’s say I want to use this model in production somewhere. I don’t necessarily want to load this huge dataset inside TabularPandas so that it reconstruct the same means and stds for Normalize and the same dictionaries mapping categories to numbers. I would rather just save those means and stds and the categories dictionaries from when I trained the model and load them back and then simply do a test_dl. Is this possible?

I just trained a tabular data auto-enccoder… It took 48h. But my TabularPandas had splits=RandomSplitter()(df)… So the splits for train and test were random. So now my model is trained, but I restarted my computer since then, so the train test split won’t be the same (I did not set the seed) and thus my Normalize stats and Category dictionaries are not going to be the same. So I am wondering if there’s a way to save those and load them back easily. I don’t want to make this mistake twice :). I saved my learner with SaveModelCallback, but now I am wondering about my Normalize stats and my categorical variable mapping…


Yes, you’d do learn.export() to export how the data was all made. You can extract all the data about it if you want as well, but Thats the easiest way.