I took the simple titanic dataset on kaggle to solve based on the first lesson. By using the RandomForrestRegresor, I got an accuracy of 0.78. (Titanic Notebook) It would be great if someone could review it and point out the mistakes / explain a better way to do it. Thanks !
All outputs are broken. Maybe you want to fix it first?
My bad. I just assumed if it works on my machine, it would work on kaggle. This is the updated link.
My 2 cents:
- Run apply_cats, proc_df and predict on test.csv and submit it in Kaggle. It will be almost certain score worse than your validation set.
- Data size is very small (891 rows) and I won’t split into train/valid set. Make use of oob_score_ instead.
Thank you ! I got a score of 0.64 based on traning data on the entire dataset using oob. Has this model overfitted ?