Thanks for the link. What I mean is more around loading back the means and stds stored in Normalize and the dictionaries mapping categories to numbers in Categorize.
Let’s say I have a huge tabular dataset (several GBs), I fit a model to it, it takes a long long time to train.
Now let’s say I want to use this model in production somewhere. I don’t necessarily want to load this huge dataset inside TabularPandas so that it reconstruct the same means and stds for Normalize and the same dictionaries mapping categories to numbers. I would rather just save those means and stds and the categories dictionaries from when I trained the model and load them back and then simply do a test_dl. Is this possible?
I just trained a tabular data auto-enccoder… It took 48h. But my TabularPandas had splits=RandomSplitter()(df)… So the splits for train and test were random. So now my model is trained, but I restarted my computer since then, so the train test split won’t be the same (I did not set the seed) and thus my Normalize stats and Category dictionaries are not going to be the same. So I am wondering if there’s a way to save those and load them back easily. I don’t want to make this mistake twice :). I saved my learner with SaveModelCallback, but now I am wondering about my Normalize stats and my categorical variable mapping…