First Kernel

I’ve just uploaded my first kernel for Porto Seguro’s Insurance comp. Which involves no feature engineering and a LB 0.25 which is not good since it relies on just using sklearn with no extensive exploration. A good way to start: https://www.kaggle.com/keremt/logistic-regression-random-forest-tutorial.

1 Like

Thanks! I think you can make this kernel much for compelling without much work by:

  • Moving the intro stuff right to the top. Make it the first thing people see! You want people nodding along from the start :slight_smile:
  • Get rid of all the errors - make it so you can read thru your kernel top to bottom on Kaggle and see results at each step
  • Try to tell a story - which means, amongst other things, you should add a little text to each cell saying what you’re doing, and what you learnt from the previous cell’s result

HTH!

1 Like

I should probably delete this one and start from scratch with you recommendations. Thanks for the tips!

I believe you can simply create new versions. Kaggle has a git-like versioning system built in I believe.

I guess it gives us 2 options:

    1. To upload a new ipynb which I think will overwrite the existing one
    1. Do changes on kaggle but it’s really slow due to connection

I am anyway not satisfied with my LB :slight_smile: So from using your tips I will play with data and try to spend more time on it before building more models. In the meantime I am reading other kernels and will add what have worked for other people as well. But some people’s discussions are conflicting with eachother…

For example, someone says one-hot encoding helped them with XGBOOST where as other says XGBOOST worked for them better with label encoding. Would you mind to give tips about how to parse information correctly from Kaggle forums and how to iterate using these information.

Thank You!

The only way I know is to experiment. Try and see! Frankly, there is a real dearth of practical (and correct) machine learning information around. We’ll talk a bit about one-hot vs ordered encodings next week, but we’ll learn that there is no known a priori way to know which will work better with trying both…

(and, of course, lean towards information from people that have good leaderboard positions in the competition!)

1 Like