Dog Breed Identification challenge

sermakarevich · November 7, 2017, 2:29am

Lets apply our skills on some other datasets. One of the image recognition competition that is currently running on Kaggle is Dog Breed Identification which we can use as our sandbox. It should be good competition to start with as these classes belong to imagenet categories so building a CNN from scratch should not be required.

Its pitty but it looks like this competitions even though it is a playground has some rules

Privately sharing code or data outside of teams is not permitted. It’s okay to share code if made available to all participants on the forums.

So I am not sure how to collaborate on the forum and not to violate this rule? Maybe @jeremy can help us with advise on this? I thought we can form a team of all fastai students and after this we can share everything on the forum Or maybe Kaggle means fastai forums

One more thing to consider is

Submission Limits
You may submit a maximum of 5 entries per day.

So we have two choices:

to create more than one team
to rely more on our internal validation

What do you think about it?

ar_ai · November 7, 2017, 2:34am

One idea is to work on the problem individually or in small teams for 3-4 weeks and later we can merge together and compare the predictions to get a good stacked model out of uncorrelated predictions from different teams.

KevinB · November 7, 2017, 2:35am

Is there a limit to team sizes? That would be my only hesitation, otherwise that would be very interesting!

sermakarevich · November 7, 2017, 2:35am

No, there is not

Team Limits
There is no maximum team size

KevinB · November 7, 2017, 2:36am

I think that would be pretty cool. We could still break into smaller groups as suggested by @ar_ai.

rishubhkhurana · November 7, 2017, 2:39am

Yeah. I guess breaking into smaller groups would be nice.

jamesrequa · November 7, 2017, 4:16am

I agree it makes the most sense to start in smaller teams so we can maximize our submission counts per day leading up to the merger deadline at which point we could elect to join forces. As @ar_ai mentioned we could improve our scores a lot just by ensembling/averaging all of our predictions together. Just keep in mind we wouldn’t be able to share our code outside teams.

sermakarevich · November 7, 2017, 4:17am

Right, we can start a private forum chat.

KevinB · November 7, 2017, 4:22am

I didn’t think about the max submissions per team per day. That might change things. Maybe start in smaller groups at least then.

ar_ai · November 7, 2017, 4:48am

After the merger, total submission can’t go above number of days competition has been running multiplied by 5. So, we can’t form big teams if individual participants already submitted a lot. We have to keep that in mind also. We may have to depend on internal validation a lot.

anandsaha · November 7, 2017, 5:53am

Guys where do you save your codebase for Kaggle competition? Otherwise keeping them in git is equivalent to sharing them …

sermakarevich · November 7, 2017, 5:59am

Are you the police?

jamesrequa · November 7, 2017, 5:59am

I either keep in on my machine locally or use a private github repo so it can be shared amongst my team-members and release it publicly after the comp ends.

anandsaha · November 7, 2017, 6:07am

Haha…

faceless GIF - Find & Share on GIPHY

Private githup repos are paid

Just noticed that bitbucket provides free private repos.

–

jamesrequa · November 7, 2017, 6:08am

Often in teams one team member can host the repo and invite other contributors so that way not everyone needs to have a paid github plan. Or like you said bitbucket is another good option for free private repos.

Sree · November 7, 2017, 7:57am

Hi. How do i get the dog breed challenge dataset from kaggle to aws fastai/data folder? Could anyone give the steps or things needed to do it.

jamesrequa · November 7, 2017, 8:03am

There are a couple of diff kaggle command line tools you can use to download the datasets to aws etc. See below are two options that I know of that work well.

Sree · November 7, 2017, 8:11am

Looking into Kaggle-cli! Thanks

divyansh · November 7, 2017, 8:37am

Hey! I’m up. If somebody decide to build a team count me in.

atul8 · November 7, 2017, 11:20am

You can also use gitlab.com for hosting private repos. There is absolutely no limits on number of contributors and number of private repos on gitlab.