I am unsure of how to call my trained model with new data.
The training and saving of my model:
procs = [Categorify, FillMissing]
cont,cat = cont_cat_split(btrain)
splits = RandomSplitter(valid_pct=0.5)(range_of(btrain))
btrain = TabularPandas(btrain, procs, cat, cont, y_names='15R', splits=splits)
dls = btrain.dataloaders(1024)
learn = tabular_learner(dls, layers=[500,250])
learn.fit_one_cycle(5, 1e-2)
pickle.dump(learn, open(filename, "wb"))
Attempting to use my model on test data:
learn = pickle.load(open(filename, 'rb'))
procs = [Categorify, FillMissing]
cont,cat = cont_cat_split(btest)
btest = TabularPandas(btest, procs, cat, cont)
dls = learn.dls.test_dl(btest)
preds = learn.get_preds(dl=dls, with_targs=False)
In the line dls = ...
I get the following error:
KeyError: "['15R'] not in index"
If I include the y-variable I am predicting, the predictions give a very high R-squared indicative of bias (0.998). Upon deployment, I will not have the 15R feature. I believe I am making a simple error in my understanding of either the dataloader or learner classes.
The docs (fastai - Tabular training) provide the following: " To get prediction on a new dataframe, you can use the test_dl
method of the DataLoaders
. That dataframe does not need to have the dependent variable in its column."