First Kernel

kcturgutlu · October 26, 2017, 5:32pm

I’ve just uploaded my first kernel for Porto Seguro’s Insurance comp. Which involves no feature engineering and a LB 0.25 which is not good since it relies on just using sklearn with no extensive exploration. A good way to start: https://www.kaggle.com/keremt/logistic-regression-random-forest-tutorial.

jeremy · October 27, 2017, 2:36am

Thanks! I think you can make this kernel much for compelling without much work by:

Moving the intro stuff right to the top. Make it the first thing people see! You want people nodding along from the start
Get rid of all the errors - make it so you can read thru your kernel top to bottom on Kaggle and see results at each step
Try to tell a story - which means, amongst other things, you should add a little text to each cell saying what you’re doing, and what you learnt from the previous cell’s result

HTH!

kcturgutlu · October 27, 2017, 2:40am

I should probably delete this one and start from scratch with you recommendations. Thanks for the tips!

jeremy · October 27, 2017, 2:41am

I believe you can simply create new versions. Kaggle has a git-like versioning system built in I believe.

kcturgutlu · October 27, 2017, 2:49am

I guess it gives us 2 options:

1. To upload a new ipynb which I think will overwrite the existing one
1. Do changes on kaggle but it’s really slow due to connection

I am anyway not satisfied with my LB So from using your tips I will play with data and try to spend more time on it before building more models. In the meantime I am reading other kernels and will add what have worked for other people as well. But some people’s discussions are conflicting with eachother…

For example, someone says one-hot encoding helped them with XGBOOST where as other says XGBOOST worked for them better with label encoding. Would you mind to give tips about how to parse information correctly from Kaggle forums and how to iterate using these information.

Thank You!

jeremy · October 27, 2017, 3:16am

The only way I know is to experiment. Try and see! Frankly, there is a real dearth of practical (and correct) machine learning information around. We’ll talk a bit about one-hot vs ordered encodings next week, but we’ll learn that there is no known a priori way to know which will work better with trying both…

(and, of course, lean towards information from people that have good leaderboard positions in the competition!)