This is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:
<<< Wiki: Lesson 10 | Wiki: Lesson 12 >>>
Lesson resources
Reflections and Observations
Learning from Kaggle Competitions after the competition closes
Some of the best learning opportunities with Kaggle competitions come after the competition closes.
Firstly, you only know know what worked in your own data processing and analysis pipeline after the fact. You can iron out inefficiencies that you experienced.
Secondly, you can learn what the winners tried. Kaggle competition winners write blog posts or host meetups. You can compare your pipeline to the winner’s pipeline. They often release the code to github and write blog posts. Looking at the winner’s blog posts, kernels, or github repositories reveal their thinking processes as well as solid code examples that have evidence of being successful.
Looking at Kaggle winners’ blogs and presentations, allow you to reflect on if your code was buggy in a critical place. Also perhaps you had a specific strategy that you wanted to apply but didn’t have time to follow through, if the winning team tried that strategy, your idea was validated in a way. Next time you can try it out.
Lastly, you’re able review different perspectives on the same problem. By the end of the competition, the experience is almost like working with a team of data scientists. You all have a shared understanding of the problem space, are working with the same vocabulary and same libraries. However, not all approaches perform as well as others. Often winning solutions approach the problem with an approach that you may not have imagined. So you’re expanding your own problem solving abilities by learning about the winners’ approach.
By working on a Kaggle problem, you’ve joined a community with similar interests and experience, and you can probably reach out to the winners to learn more, and most are more than happy to share their experience and expertise with you.
Review of Logistic Regression model-building
Introduction to Naive Bayes
Introduction to Natural Language Processing (NLP)
Working with Word Vectors
NLP with PyTorch and fastai library
Introduction to Word Embeddings
Feature Engineering with an Embedding Matrix
As long you have enough data, keeping data as categorical where possible is a good idea.