My first Kaggle medal 🥈!

ilovescience · September 12, 2019, 12:41am

I recently got my first medal in the APTOS 2019 Blindness Detection competition this past weekend!

I describe my experience here:

I started Kaggle 3 years ago. The only course I had taken was Andrew Ng’s Machine Learning course (which is still a great course for understanding the theory behind machine learning algorithms). However, I had no practical experience doing machine learning. I would just play around with existing kernels and I was struggling to do well in competitions. Eventually, I kind of gave up and just checked in on the Kaggle competitions once in a while to see what was going on in the community.

Then, I saw a kernel using fastai on Kaggle. I had heard about fastai earlier and was interested in taking it and I decided to finally start listening to the lectures. Being quite busy, I only listened to a couple of lectures before the next iteration of the course was release in January. Within a couple months I finished the course. I turned to Kaggle to practice my skills. Using the skills I learned in fastai immediately helped me improve my Kaggle game. For my first competition while taking fastai, the Hisopathological Cancer competition, I was within the top 14%, which was the best I got on Kaggle at the time.

Flash-forward to now, and the skills and techniques I learned helped me get my first medal in a Kaggle competition !

Thanks to @jeremy and the fast.ai team for developing an amazing course and library. And thanks to the community for their support and guidance!

radikubwa · September 13, 2019, 5:47am

congratulations

heye0507 · September 14, 2019, 1:59am

Me too.
That is my first silver too. I was going to report my solution, but got tied up in a program called Insight. (Which takes all my time in the past week), and I’m super surprised that the first day I said I have a kaggle medal half of the team come to meet me during lunch

What I found interesting things about efficient net with concat pooling is still calling me to find out.

But congrats on your silver!

And I really appreciated the fastai community, without all the help, none of this is happening for me

Since I’m now in SF, I hope I can accidentally meet Jeremy sometime

Thank you all!

ilovescience · September 14, 2019, 2:01am

By Insight do you mean this program? If so, how is this program?

Congratulations to you too! Thanks for your contributions to the competition as well!

heye0507 · September 14, 2019, 5:55am

Yes, I’m currently in the AI session right now.

It is a very intense program, as far as I can tell.

But you also meet a lot of people here, it is fun.

ilovescience · September 14, 2019, 6:16am

Are you learning new techniques you otherwise wouldn’t learn from fastai, forums, or keeping up with the SOTA? What kind of things do you learn (if you are allowed to say)?

I have seen some of the posts from the fellows and they are quite interesting. Please share the project that you work on!

heye0507 · September 14, 2019, 7:16am

You learn all sort of things, especially building an end to end solution. You can’t really show people jupyter notebook code, you have to be able to deliver things that runs on script. So people can just go to your repo, and fork and run it. That’s very new to me… and you don’t have a Kaggle LB to pull scores to figure out how your model is doing…

Also, believe me or not, 9 out 10 people heard that I am coding in jupyter notebook and they are like you are not using IDE? (even I told them I will turn the code into module by using the Fire library showed in part 2, they still don’t believe me…)

Project wise, I am currently working on one, so I don’t have a solid solution yet (it’s my week 1 here). And I have been placed with consulting project with a company which I am not sure I can share much…

But basically is building a sentiment analysis classifier for small and high imbalanced dataset. I am very surprised that they never heard about ULMFiT, this days people are using BERT for all sort of downstream tasks. Therefore when I proposed ULMFiT, they are surprised that a traditional LSTM model can do the job (so I am the one right now training a language model using AWD_LSTM to convince them next week). But based on my Jigsaw understanding, BERT is beating them all…

If I fail… I probably just go back to BERT, then probably fine tune the BERT language model on the corpus, and see if the result can be improved…

I am actually feeling very well about the program, where you have mentors to ask questions…
Also, I just think the project like another sort of kaggle competition, you are testing with yourself.

Accuracy is not top-1 priority now, 0.1% improvement by stacking couple models doesn’t work in here… You always got question like why traditional ML can’t solve the problem…

So you always have to check your run time… Believe me or not, people are just using some classic ML solution that runs on CPU and got a ok result, v.s you are running a model with T4 GPU got a good result. They will really think about if the amount of the accuracy they can tolerate.

Learning something new every day

ilovescience · September 14, 2019, 7:27am

Yes! Hopefully your colleagues are learning from you a new way to do data science! I use a combination of IDE and Jupyter Notebook and they obviously each have their benefits.

Yes, ULMFiT is not well known in the NLP community. Unfortunately, ULMFiT is not as flexible as other models as it is a unidirectional model, but for a sentiment classifier, it is probably fine. I wish you luck.

You probably already know, but huggingface has their amazing pytorch-transformers repository. There is some discussion here and here for using the repository with the fastai library.

Kaggle is very different in this respect. In Kaggle you try your best to improve more model, even by 0.1%, even if requires a 8 GPUs LOL. But based on your experience it seems Insight is focused on developing solutions that can be implemented and used in the wild!

Good luck on your project! Thanks for sharing your experience!

heye0507 · September 14, 2019, 7:31am

I heard there’s some issue with BERT in fastai, where you have to put output[0] to change the source code… I think Jeremy also commented said you can solve by callback, but I think in that post they still reported that it didn’t work…

Well, I am about to find out…

ilovescience · September 14, 2019, 7:33am

Unfortunately, I have not tried using BERT with fastai so I wouldn’t know. I would love to try it in the future though. I started playing around with one of the examples in the huggingface repository but I started getting busy with Kaggle competitions.

Good luck to you!

theDudeHimself · September 20, 2019, 12:08pm

I don’t get it, were people good or bad surprised when you mentioned that you use Jupyter notebooks?

DrHB · September 20, 2019, 1:34pm

Hii! I am also thinking of applying… How was your application process experience ?

heye0507 · September 20, 2019, 3:11pm

People said I will have a hard time, since all the Jupiter notebook problem (not IDE, out of order excution, no debugger…etc)

And they shocked when I use fire

Because in the beginning of .py file, it has famous Jeremy’s quote, ‘Auto generated’, they thought I am using some auto ml

heye0507 · September 20, 2019, 3:24pm

Applying Insight?

If you want, I can recomand you to the program director since I’m here. I think you will be first kaggle gold medal winner in Insight

My interview for Insight is very straightforward, in July, they said I have been selected for phone interview (after submitting application). The interview will take about 30 mins, they will ask you to introduce yourself to start the conversation, then introduce a project you have done, finally discuss a research paper they sent out to you a week before your interview. My paper was Focal loss

Guess what, I was working on porting 2018 SSD to V1 for a month, so I changed SSD structure to Retina net structure and implemented focal loss. When they asked me to discuss my project during interview, I showed them my Implementation. I guess I impressed them a lot so I got my offer letter 5 hours after my interview…

So far at here is pretty good, I don’t have a PHD, 80% of the team in my group has a PHD (I’m in the AI fellow group, the 2 data science group are all PHDs…). Working with smart people is very intense… Be aware of that, otherwise, it’s a very good program

DrHB · September 20, 2019, 3:38pm

Ha! I might ask for your recommendation in few months=) I am currently finishing up some project in my research area. Thank for info =) I will be really curios to know how your thing will go and hopefully you can land a very nice job! =)

P.S Do you have linkden or something ?=)

heye0507 · September 20, 2019, 3:46pm

https://www.linkedin.com/in/hao-he-b677a26a/

Thanks, I will let you know how the project and interview week going in here

theDudeHimself · September 20, 2019, 5:00pm

thanks! maybe I didn’t get to it in the course yet, but what’s this fire library you are referencing and what does it do? thanks again!

heye0507 · September 20, 2019, 8:54pm

It turns jupyter notebook cells to script.

github.com

fastai/course-v3/blob/master/nbs/dl2/notebook2script.py

#!/usr/bin/env python

import json,fire,re
from pathlib import Path

def is_export(cell):
    if cell['cell_type'] != 'code': return False
    src = cell['source']
    if len(src) == 0 or len(src[0]) < 7: return False
    #import pdb; pdb.set_trace()
    return re.match(r'^\s*#\s*export\s*$', src[0], re.IGNORECASE) is not None

def getSortedFiles(allFiles, upTo=None):
    '''Returns all the notebok files sorted by name.
       allFiles = True : returns all files
                = '*_*.ipynb' : returns this pattern
       upTo = None : no upper limit
            = filter : returns all files up to 'filter' included
       The sorting optioj is important to ensure that the notebok are executed in correct order.
    '''

This file has been truncated. show original

Here is what I meant
We used a lot when building fastai from scratch in part 2. I didn’t manage to find time to build them all, so my progress now stopped at mixup… got to pick it up