Remote Kaggle Study Group meets Saturdays at 10 AM PST

jcatanza · December 8, 2019, 5:52pm

Remote Kaggle Study Group!

Practice what you’re learning in fast.ai, test yourself, and further develop your skills by diving into Kaggle competitions with us!

For our initial project we’ve chosen the Aptos Blindness Detection Kaggle Competition

We meet Saturdays at 10 AM PST (1PM EST, 7 PM CET, 11:30 PM IST)

Join the Zoom Meeting at the above date and time

We’ll use this Fastai Forum thread for discussions, and for meeting announcements.

For other announcements, and administrative matters (such as polls), we’ll use the #kaggle channel on the TWiML & AI x fast.ai Slack Group, courtesy of Sam Charrington. To get an invitation to join the Slack Group, sign up here.

twimlai-x-fastai-3

jcatanza · December 9, 2019, 7:49pm

Summary of the 12/07/2019 Kaggle meetup

We had an interesting and wide ranging discussion. There were 12 attendees. Here is my recollection; please correct me or add anything important that I missed. Apologies in advance if I mis-attributed anyone’s comments or contributions!

Yuri spoke about a kernel that he is working with, at https://github.com/anovik/Kaggle-Aptos-2019-Blindness-Detection from Anna Novikova

Mehul spoke about the EfficientNetB4 FastAI - Blindness Detection kernel: https://www.kaggle.com/hmendonca/efficientnetb4-fastai-blindness-detection

We talked a bit about using the SMOTE package for imbalanced data sets

Srinivas mentioned a very similar Kaggle competition in the past that had 30,000 images compared to the 3000 in this competition, and reflected that we might look at that competition for ideas applicable to this one.

Someone asked why we normalize to ImageNet rather than to the training data itself, and I said that it is because we are doing transfer learning using the pretrained weights from ImageNet, but that it would be interesting to try both ways to verify.

Mehul had the idea to compile a list of experiments that we could do.

Experiment #1 Normalize with ImageNet vs. normalize with training set

Srinivas spoke about Focal Loss, an idea developed by Facebook in 2017 https://arxiv.org/abs/1708.02002. Focal Loss is a sort of modified cross-entropy loss that ‘focuses’ the model on hard examples by dynamically attenuating the loss for easy examples. Here, hard/easy refer to examples with low/high degree of confidence in the classification. Focal loss is helpful for class-imbalanced problems.

Srinivas mentioned a kernel called “be careful what you train on” https://www.kaggle.com/taindow/be-careful-what-you-train-on which shows how you can fool yourself by believing a model that gets good results for the wrong reasons.

We then talked about SHAP values, a principled method (originating in game theory) to quantitatively assign how much to “blame” each feature for a classification decision. Kind of like “importance” but much better.
Here’s the original paper: https://arxiv.org/abs/1705.07874
Here’s a tutorial on Kaggle: https://www.kaggle.com/dansbecker/shap-values

akil · December 9, 2019, 10:42pm

We are excited…!

Srinivas · December 12, 2019, 6:04pm

Thanks a lot to Joseph for the detailed note and reference links. Helps to keep track of various items. Credit where credit is due though - it was Mehul who brought up Focal Loss and pulled up the relevant paper from FB. I just pointed out where and how it has been used including/espy in Object Detection as illustrated by Jeremy in v2 part 2 of fastai course.