# FastChai and Kaggle: Group based Projects

This is great. I am interested in any tabular competition. In particular, this unit sales forecast of walmart data.

I will join the Saturday 6pm PT zoom call.
2 Likes

This is really interesting, too! However, the data is quite huge. I use Google Colab, which has limited storage space. So, I probably can’t do this one.

2 Likes

That one is a running one , I wonder how from “you will predict whether students are able to answer their next questions correctly” one can really help students? I mean the problem statment “In 2018, 260 million children weren’t attending school…equity gaps in every country could grow wider…” that is interesting to solve.

Just saying, knowing 95% if the student can answer, what would be next?

I imagine Sanyam said archived/completed competitions so that we don’t run risk of breaking any kaggle in-competition rules.

1 Like

I think they are offering adaptive learning solutions to school kids. If you can predict which question a student can answer and which he/she can’t, then you can speed up the learning curve by tailoring a course to the student’s current knowledge and rate of learning.

1 Like

I am more interested in the problem than the competition. In fact, I am busy for next 15 days. I don’t mind if we do this after the competition ends.

1 Like

I see, I wasnt able to get “why” would be that helpful, other eyes sees different, thanks!

Still, will like to try something, even if only 1 submit , because indeed the gap this years of childs not being taked care correctly will be huge.

Here’s my my Kaggle starter to the plant pathology problem

The dataset is small which is convenient in order to quickly iterate. There are 1821 images in the training set and test set. Images are quite big (most are 2048x1365).

From this starter, I imagine some steps:

• improve the training (like the # of epoch and learning rates)
• progressively train with bigger image sizes
• define the same metric as the competition (AUROC mean over columns) and use it at training time
• train deeper (I used resnet34)
• use cross validation
• test different augmentations and TTA

I’d be interested in working more on this problem.
This version gives 0.91984 on the private LB, 0.92766 on the public LB. (the exact same notebook V3 gave me 0.92311/0.93321 - it shows the seed is a parameter…)

There are 1317 teams in this challenge so top 10% means a position below 132. Private LB is 0.97360 (Public 132 position: 0.98124). So plenty of room to improve and the target seems achievable.

5 Likes

@bam098, @kevinh, @Joan, @Romandovega

2 Likes

David is correct, we will only work on archived competitions, acc to Kaggle rules, “Private sharing” is not allowed-this is v tough to strictly follow in a large group, for the same reason, as a group, we’ll only work on archived comps.

If you find a great teammate and you decide to team up for an active comp, we wish you gold, but please don’t post the materials here to avoid any -ve repercussions to your profile

I want to work on a project that has elements of Sustainability + Deep Learning to it.

Ah nice! I want to take a look at it later today. I think I will go for that competition as well. I will put myself in the table above. If you want, feel free to join

@init_27 Here there is a limit of 20 people max. How to mention further names?

1 Like

@init_27 would like to join plant pathology by @bam098 (doing this since cant add to the wiki due to mention limit)

If you like to join the fast.ai Discord channel here it is

For those who are new to Kaggle competitions/or competition, in general, have a look at this

Additional datasets can be found at https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/data/datasets.md

Please do not use this a silver bullet, please use it as a guideline, and also do your own things.

1 Like