FastChai and Kaggle: Group based Projects

Note: This is a Wiki! Please edit the table below, add your name to a team if you see something of your interest, or start a new team if you don’t find anything of your interest. Note, teams are capped at 5, so if a slot is full, start a second team :slight_smile:

Note 2: To contact your teammates, please find their username in the table and feel free to start a group message with them via the forums-you can then pick a platform of your choice after the first few messages :slight_smile:

Note by Sanyam: I’ve removed @ from usernames since there’s a limit to number of mentions, please just add your username to the table below and don’t use an @.

Good Luck!

9th Jan '21 - 23rd January '21 Sprint

Leader Name Comp Link Teammates (Max 3= Leader +2) Comments/Writeup/Etc
kevinh Global Wheat Detection Isaac.Flath, harshasatyavardhan Bounding boxes
Romandovega (David Dirring) Fraud Detection ~ Use 1D-CNN approach from MOA Comp on Fraud Comp data
thetj09 Plant classification Image Classification
thetj09 Fraud Detection Tabular
Mani Sarkar (neomatrix369) Tweet Sentiment Extraction Ramshankar (thedatacrack) NLP: one or more words leading to the “sentiment” - our progress.
Archived/Older Sprints

19th Dec '20 - 9th January '21 Sprint

Leader Name Comp Link Teammates (Max 3= Leader +2) Comments/Writeup/Etc
rghosh Dogs vs. Cats [SOLO] (Don’t add name) Image Classification
bam098 Plant Pathology rgosh misza222 Image Classification
kevinh Global Wheat Detection Isaac.Flath userr2232 Bounding boxes
Plant classification filtercoffee, thetj09 Image Classification
Fraud Detection filtercoffee, thetj09 Tabular

5th Dec-19 Dec Sprint

Leader Name Comp Link Teammates (Max 3= Leader +2) Comments/Writeup/Etc
bam098 Plant Pathology marshath rgosh Image Classification
Can someone with Experience in DL/kaggle please join ? Plant classification filtercoffee, thetj09 Image Classification
rghosh Cornell Birdcall Identification [1-SPOT-RESVD] priyank7n Audio Classification
Isaac.Flath Plant Pathology ramesh vinaykudari priyank7n Image Classification
Isaac Flath Fraud Detection David Dirring (Romandovega) Tabular
Placeholder Placeholder Placeholder Placeholder

21st Nov Sprint-5th Dec

Leader Name Comp Link Teammates (Max 5) Comments/Writeup/Etc
init_27 QuickDraw rghosh vinaykudari bismillahkani aditya1601 uchralganbat Team Masala Chai/ Image Classification Comp
cataluna84 Research (Self-Supervised) alsombra, shikha, 0tist, Mrityunjay, radiangle Comparison of Semi-Supervised contrastive objectives
j.laute Cornell Birdcall Identification saiteja123 Nubbinsonfire apsal rghosh noel Audio classification
rajeshkumarkarra NFL Big Data Bowl - Team Oolong / Tabular Data Competition
bam098 Plant Pathology misza222 vtgvipul crazydiv quincy marshath Multilabel Classification
kevinh Plant Pathology Dinesh, Isaac.Flath, Sonu, Elijah (second plant pathology group since bam098’s group is full)
ramesh Plant Pathology Partha, Sashi, Sudharsan, Vijay, priyank7n Fab 5
pardeep0019 Toxic Comment Classification Challenge bluesky314 felixark8 sdasari625 goutham794 saharhash NLP Classification
Shivansh Google Quest Challenge tlaurent, shikha, shyamsh, priyank7n Question-Answering NLP
Mani Sarkar (neomatrix369) Tweet Sentiment Extraction Ramshankar (thedatacrack) , Sahil(smehla),anishjain, Abhishek (abhishek) NLP: one or more words leading to the “sentiment”
shyamsh Multilingual Toxic Comment Classification Ved Multilingual NLP Classification
fraserGlasgow Kuzushiji Recognition Aditya, Ankit(dhaniyapatta), Yudhishthir, durgaamma2005, akhilgod Image recognitiona nd classification
mdmanurung PANDA Challenge Aneesh, Joan, priyank7n,Biswajit_123, igrek Image Classification. Discord link here
David Dirring (romandovega) Fraud Detection Isaac.Flath, init_27, ramesh,amiyo, harish3110,manasagiddalapati Tabular, Classifier
Butch Landingin (butchland) West Nile Virus Prediction tyoc213, harish3110,manasagiddalapati Tabular, Classifier, GIS/Spatial, Time Series
Neil MacAloney (neilmac) Airbus Ship Detection igrek,Ricky(SisengCo), kurianbenoy, priyank7n Image Classification. Discord link
"Links to Write-ups/Demos

Please share your solutions! As a reminder, always give due credit if you’re using scripts/notebooks from Kaggle :slight_smile:

Name Link Comp Link Comments + Credit
Isaac.Flath Approach Fraud Detection Resarched from Fastbook, Various Kernels, especially several by Chris Deotte, Discussion Posts
kevinh Write-up Plant pathology Reached top 12%. Hard to build a reliable CV, mislabeled images, noisy labels
Isaac.Flath Write-up Plant pathology Reached top 10%. Hard to build a reliable CV, mislabeled images, noisy labels, 1 class very imbalanced

As we reach the final chapters of the book in our reading groups, I wanted to propose the next steps and project-based learning.

Schedule:

  • If you sign up, you’ll be notified, but ideally please also keep an eye out on this wiki

Starting 21st November:

Goal:

The goal of these groups is to stimulate smaller groups and force us to learn by applying.

The suggested structure will be:

  • Hosting 2 week-long Kaggle Sprints
  • Every 15 days, a leader will pick a suggested competition of their choice
  • Groups of max 5 members will be formed, edit this is now suggested at a limit of 3, but it’s upto the teams to set the final size
  • A person can join multiple teams
  • The goal will be to land in the Top 10% of the LB in 15 days, using fastai as much as possible or PyTorch when needed
  • You’re allowed to copy, fork, translate TF, try any ideas but the deadlines will be 15 days
  • At the end of 15 days, every group will present on a demo day-the complete business problem, their solution walkthrough, results and possible improvements, things that didn’t work

Any suggestions are welcomed, ideally, this will serve as a wiki to multiple groups and everyone will self-assign themselves, this will also allow everyone to work on problems that are self-motivating.

There will be 2 calls, one via MLT and the second via fastai discord.

Thanks and Regards,
Sanyam

PS: For any questions, please @ me in this thread, kindly avoid DMs as that might cause an overflow and might delay my responses.

51 Likes

This is really cool! Would it be possible to do the kaggle competition part without coming on Saturdays? I am completely unavailable on Saturdays. I am currently unable to slice out the time, but maybe able to come December.

2 Likes

Definitely, no gate-keeping whatsoever. That’s why I’d like to create mini groups of a maximum of 5-the groups may co-ordinate amongst themselves

I’ll also open up a poll a week before incase more people agree, we can shift to Sundays.

Nevertheless, do participate online/offline and share your work in this thread :slight_smile:

5 Likes











https://www.kaggle.com/aleksandradeis/globalwheatdetection-eda
2 Likes

The above post also contain links to the project ideas for inspiration.

3 Likes

I’m interested to join a group but my ability and time are limited.

I’ve only done a little bit of Kaggle, the Titanic thing mostly a couple years ago.
Any tips for how to prepare for joining a team?
:slight_smile:

Drink a lot of chai and don’t give up till you get to top 10% :slight_smile:

4 Likes

I’m keen to get involved with this!

1 Like

Hi… Nice initiative… How does one go about joining this… Also i am more of a pure pytorch and TF/keras guy… Will i still be welcome?

1 Like

Cool idea, I’m game.

Can we have a shared excel doc for this? It can contain week-wise schedule of problems based on interested participants.

I am in!

1 Like

Yes, there is no gate-keeping

Thanks, will add a table to the wiki above this week

I am keep on participating in this, how and where do I enroll?

2 Likes

Up for it.

It will be fun.
I’m in.

I would be interested in one of the following competitions :slight_smile:

2 Likes

Nice list! I like the plant pathology one (seems like a multi-label classification problem) and the facial keypoints detection (seems similar to the book’s example with the Biwi Kinect Head Pose dataset)

1 Like

Ah nice! :slight_smile: Yes, I think the plant pathology should be a multi-label classification. This might make it even a bit more interesting. Regarding the keypoint detection: Yes, I also think it’s similar to Kinect dataset. I wonder if a keypoint detection can be further used as a first step for other application, e.g. anonymizing faces in videos streams (or webcam) by putting something over the faces. Here, keypoints might help to do that.

Great idea! I would be interested in participate in most of the proposed competitions. Moreover I am very interested in working in this one: