Chapter 9: Random Forest predictions

dabeeper · April 23, 2022, 9:00pm

Newbie question here. I am following Ch9 of fastbook dealing with tabular data. Using this guide I am able to follow things and it seems to work. I am not interested in the NN portion, but instead would like to take my rf model and predict on a “test.csv” file, kinda like if we had kaggle data.

I am not sure I am doing this right, so please comment if my flow is incorrect.
I read in the csv file as a new dataframe
run the cont/cat on the new dataframe, but without the split
create the tabularobject, we shall call to2
remove the less important columns that my last rf presented me
I am then left with a to2 df that is the correct size

so now it is time to run prediction:
y=m.predict(to2)
I get a single column of floats in return, and not the friendly text.

So how do i get the “categories” to appear in order to correct submit my data?

Second question is, is my process flow correct or are my embeddings all messed up now?