I have trained a model using a tabular training data set.And now I want to use this model to predict some new data and get a answer.But how should I do?
I tried that read the unseen datas with test = pd.read_csv(),but when I called the method model.predict(test),it seem don’t work.
And I tried to use for-loop which is res = [learn.predict(test.iloc[i])[2] for i in range(test.shape[0])],
now it can run correctly but it too SLOW!!
I have more than seventy thousands unseen datas to predict,for-loop method costs more than six hours.
So how to predict large unseen datas quickly?
please help me,thanks!
I assume that there is a proper way to do that, probably involving something like .add_test, and making sure results wouldn’t be shuffled, but I ended up writing my own functions for that.
You can get prediction with get_cust_preds() from there
The only major thing is that you should split the process of data object creation into 2 phases (as overwise it’s impossible to get normalisation parameters used )
thank you very much!
I have had the same issue, here is an elegant solution ![]()
Sometimes you cannot build the test dataset in advance, because you are getting the test data via an API, like in the Kaggle Riid competition.
I ended up creating a small function which predicts batches:
def predict_batch(self, df):
dl = self.dls.test_dl(df)
dl.dataset.conts = dl.dataset.conts.astype(np.float32)
inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
return preds.numpy()
setattr(learn, 'predict_batch', predict_batch)
This function can be used like this:
%%time
sample_size = 2_000_000
preds = learn.predict_batch(learn, X[features].iloc[:sample_size])
roc_auc_score(X[target][:sample_size].values, preds)
I just wonder, why such a function is not in the fast.ai API. Keras for example supports this functionality out of the box.