I have trained a model using a tabular training data set.And now I want to use this model to predict some new data and get a answer.But how should I do?
I tried that read the unseen datas with test = pd.read_csv(),but when I called the method model.predict(test),it seem don’t work.
And I tried to use for-loop which is res = [learn.predict(test.iloc[i])[2] for i in range(test.shape[0])],
now it can run correctly but it too SLOW!!
I have more than seventy thousands unseen datas to predict,for-loop method costs more than six hours.
So how to predict large unseen datas quickly?
please help me,thanks!
I assume that there is a proper way to do that, probably involving something like .add_test, and making sure results wouldn’t be shuffled, but I ended up writing my own functions for that.
You can get prediction with get_cust_preds() from there
The only major thing is that you should split the process of data object creation into 2 phases (as overwise it’s impossible to get normalisation parameters used )
thank you very much!
I have had the same issue, here is an elegant solution
Sometimes you cannot build the test dataset in advance, because you are getting the test data via an API, like in the Kaggle Riid competition.
I ended up creating a small function which predicts batches:
def predict_batch(self, df):
dl = self.dls.test_dl(df)
dl.dataset.conts = dl.dataset.conts.astype(np.float32)
inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
return preds.numpy()
setattr(learn, 'predict_batch', predict_batch)
This function can be used like this:
%%time
sample_size = 2_000_000
preds = learn.predict_batch(learn, X[features].iloc[:sample_size])
roc_auc_score(X[target][:sample_size].values, preds)
I just wonder, why such a function is not in the fast.ai API. Keras for example supports this functionality out of the box.