I’m training tabular_learner models. I would like to get these to production, but I’m struggling to understand the best way to bring in my data I would like to perform predictions on, as the Tabular Datasets require parameters such as continuous variables, y_targets, etc.
When training these models, the process to prepping the data can be represented by
Data -> Dataframe -> Categorizing Columns (using .ascategory) -> TabularPandas -> TabDataLoader -> DataLoader
Obviously this applies my categoricals, continuous, Normalize(), etc., but is this same process required in my production environment? Just to experiment, I have been saving out my models, importing them back in, and trying to run predictions on my test set with learn.predict, which just yield :
AttributeError: ‘DataFrame’ object has no attribute ‘conts’
when using learn.predict(test_dl.iloc[0]) for example. Running learn.predict(test_dl) just yields
“AttributeError: to_frame”.
This test set works when I run learn.validate(dl = dls.test_dl) to test my accuracies when training models, but this will not work when I don’t have the actual truth source for future predictions in my production environment.
Does anybody have some code they could share for a tabular_learner on how they prep their data before they run predictions in a more production-like environment?
I can of course share the entire tracebacks, but the question is more of a general best practices one as opposed to “I believe I’m doing this correctly, so what is wrong with my code?”. I’m somewhat lost because as a new user who has gone through the course, I’m still very confused by dataloaders, data bunches, and all of the transforms and changes in data types we must apply between loading and running predictions/training on data.