FastChai and Kaggle: Group based Projects

Romandovega · November 20, 2020, 8:54pm

This is great. I am interested in any tabular competition. In particular, this unit sales forecast of walmart data.

I will join the Saturday 6pm PT zoom call.

bam098 · November 21, 2020, 12:11am

This is really interesting, too! However, the data is quite huge. I use Google Colab, which has limited storage space. So, I probably can’t do this one.

nishant_g · November 21, 2020, 12:25am

Anyone interested in https://www.kaggle.com/c/riiid-test-answer-prediction?

tyoc213 · November 21, 2020, 2:47am

That one is a running one , I wonder how from “you will predict whether students are able to answer their next questions correctly” one can really help students? I mean the problem statment “In 2018, 260 million children weren’t attending school…equity gaps in every country could grow wider…” that is interesting to solve.

Just saying, knowing 95% if the student can answer, what would be next?

Romandovega · November 21, 2020, 3:02am

I imagine Sanyam said archived/completed competitions so that we don’t run risk of breaking any kaggle in-competition rules.

nishant_g · November 21, 2020, 3:07am

I think they are offering adaptive learning solutions to school kids. If you can predict which question a student can answer and which he/she can’t, then you can speed up the learning curve by tailoring a course to the student’s current knowledge and rate of learning.

nishant_g · November 21, 2020, 3:10am

I am more interested in the problem than the competition. In fact, I am busy for next 15 days. I don’t mind if we do this after the competition ends.

tyoc213 · November 21, 2020, 3:20am

I see, I wasnt able to get “why” would be that helpful, other eyes sees different, thanks!

Still, will like to try something, even if only 1 submit , because indeed the gap this years of childs not being taked care correctly will be huge.

kevinh · November 21, 2020, 3:58am

Here’s my my Kaggle starter to the plant pathology problem

The dataset is small which is convenient in order to quickly iterate. There are 1821 images in the training set and test set. Images are quite big (most are 2048x1365).

From this starter, I imagine some steps:

improve the training (like the # of epoch and learning rates)
progressively train with bigger image sizes
define the same metric as the competition (AUROC mean over columns) and use it at training time
train deeper (I used resnet34)
use cross validation
test different augmentations and TTA

I’d be interested in working more on this problem.
This version gives 0.91984 on the private LB, 0.92766 on the public LB. (the exact same notebook V3 gave me 0.92311/0.93321 - it shows the seed is a parameter…)

There are 1317 teams in this challenge so top 10% means a position below 132. Private LB is 0.97360 (Public 132 position: 0.98124). So plenty of room to improve and the target seems achievable.

init_27 · November 21, 2020, 7:53am

@bam098, @kevinh, @Joan, @Romandovega
Please add your comps to the wiki, and anyone that might be interested would join you

init_27 · November 21, 2020, 7:55am

David is correct, we will only work on archived competitions, acc to Kaggle rules, “Private sharing” is not allowed-this is v tough to strictly follow in a large group, for the same reason, as a group, we’ll only work on archived comps.

If you find a great teammate and you decide to team up for an active comp, we wish you gold, but please don’t post the materials here to avoid any -ve repercussions to your profile

imNitin · November 21, 2020, 8:58am

I want to work on a project that has elements of Sustainability + Deep Learning to it.

bam098 · November 21, 2020, 9:10am

Ah nice! I want to take a look at it later today. I think I will go for that competition as well. I will put myself in the table above. If you want, feel free to join

Shivansh · November 21, 2020, 9:54am

@init_27 Here there is a limit of 20 people max. How to mention further names?

marshath · November 21, 2020, 10:06am

@init_27 would like to join plant pathology by @bam098 (doing this since cant add to the wiki due to mention limit)

neomatrix369 · November 21, 2020, 10:08am

If you like to join the fast.ai Discord channel here it is

For those who are new to Kaggle competitions/or competition, in general, have a look at this

github.com

neomatrix369/awesome-ai-ml-dl/blob/master/competitions.md#general

# Competitions: AI, ML, DL, DS

## General

- [Kaggle competitions](https://www.kaggle.com/competitions)
    - [Closed DS Competition: CareerVillage](https://www.kaggle.com/c/data-science-for-good-careervillage)
    - [Closed DS Competition: City of Los Angeles](https://www.kaggle.com/c/data-science-for-good-city-of-los-angeles)
    - [Closed ML Competition: Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic)
    - [Closed ML Competition: Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats/)
    - [Active DS Competition: 2019 Data Science Bowl](https://www.kaggle.com/c/data-science-bowl-2019)
    - [Active ML Competition: RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection)
    - [Active ML Competition: ASHRAE - Great Energy Predictor III](https://www.kaggle.com/c/ashrae-energy-prediction)
    - [Active DS Competition: NFL Big Data Bowl](https://www.kaggle.com/c/nfl-big-data-bowl-2020)
    - [Closed NLP Competition: Tweet Sentiment Extraction](https://www.kaggle.com/c/tweet-sentiment-extraction/)
        + [Useful NLP and competition resources](https://www.kaggle.com/c/tweet-sentiment-extraction/discussion/159520)
        + [My popular discussions](https://www.kaggle.com/c/tweet-sentiment-extraction/discussion/159361)
- [Bitgrit competition platform](https://competition.bitgrit.net/)
- [CrowdAI](https://www.crowdai.org/challenges?challenge_filter=active)
- [Crowd Analytix](https://www.crowdanalytix.com/community)
- [CodaLab](http://codalab.org/)

This file has been truncated. show original

Additional datasets can be found at https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/data/datasets.md

Please do not use this a silver bullet, please use it as a guideline, and also do your own things.

init_27 · November 21, 2020, 10:47am

I wasn’t aware of the mention limit, please just add your username without using an @, that should do it

smehla · November 21, 2020, 11:11am

@init_27 Not able to add more that 20 users in the post

benihime91 · November 21, 2020, 11:15am

Hello, I missed the meeting India time. I joined half an hour late. is the meeting over @init_27.
I can’t seen to join now in the link that was shared.

tehnick · November 21, 2020, 11:32am

cannot edit the table to add me in (@tehnick) to the plant pathology group.
i get a warning on not being able to mention more than 20 users in a post .