Dog Breed Identification challenge

(sergii makarevych) #1

Lets apply our skills on some other datasets. One of the image recognition competition that is currently running on Kaggle is Dog Breed Identification which we can use as our sandbox. It should be good competition to start with as these classes belong to imagenet categories so building a CNN from scratch should not be required.

Its pitty but it looks like this competitions even though it is a playground has some rules

Privately sharing code or data outside of teams is not permitted. It’s okay to share code if made available to all participants on the forums.

So I am not sure how to collaborate on the forum and not to violate this rule? Maybe @jeremy can help us with advise on this? I thought we can form a team of all fastai students and after this we can share everything on the forum :slight_smile: Or maybe Kaggle means fastai forums :wink:

One more thing to consider is

Submission Limits
You may submit a maximum of 5 entries per day.

So we have two choices:

  1. to create more than one team
  2. to rely more on our internal validation

What do you think about it?


One idea is to work on the problem individually or in small teams for 3-4 weeks and later we can merge together and compare the predictions to get a good stacked model out of uncorrelated predictions from different teams.

(Kevin Bird) #3

Is there a limit to team sizes? That would be my only hesitation, otherwise that would be very interesting!

(sergii makarevych) #4

No, there is not

Team Limits
There is no maximum team size

(Kevin Bird) #5

I think that would be pretty cool. We could still break into smaller groups as suggested by @ar_ai.

(Rishubh Khurana) #6

Yeah. I guess breaking into smaller groups would be nice.

(James Requa) #7

I agree it makes the most sense to start in smaller teams so we can maximize our submission counts per day leading up to the merger deadline at which point we could elect to join forces. As @ar_ai mentioned we could improve our scores a lot just by ensembling/averaging all of our predictions together. Just keep in mind we wouldn’t be able to share our code outside teams.

(sergii makarevych) #8

Right, we can start a private forum chat.

(Kevin Bird) #9

I didn’t think about the max submissions per team per day. That might change things. Maybe start in smaller groups at least then.


After the merger, total submission can’t go above number of days competition has been running multiplied by 5. So, we can’t form big teams if individual participants already submitted a lot. We have to keep that in mind also. We may have to depend on internal validation a lot.

(Anand Saha) #11

Guys where do you save your codebase for Kaggle competition? Otherwise keeping them in git is equivalent to sharing them …

(sergii makarevych) #12

Are you the police? :slight_smile:

(James Requa) #13

I either keep in on my machine locally or use a private github repo so it can be shared amongst my team-members and release it publicly after the comp ends.

(Anand Saha) #14


Private githup repos are paid :disappointed_relieved:

Just noticed that bitbucket provides free private repos.

(James Requa) #15

Often in teams one team member can host the repo and invite other contributors so that way not everyone needs to have a paid github plan. Or like you said bitbucket is another good option for free private repos.

(K Sreelakshmi) #16

Hi. How do i get the dog breed challenge dataset from kaggle to aws fastai/data folder? Could anyone give the steps or things needed to do it.

(James Requa) #17

There are a couple of diff kaggle command line tools you can use to download the datasets to aws etc. See below are two options that I know of that work well.

(K Sreelakshmi) #18

Looking into Kaggle-cli! Thanks

(Divyansh Jha) #19

Hey! I’m up. If somebody decide to build a team count me in.

(Atul Aggarwal) #20

You can also use for hosting private repos. There is absolutely no limits on number of contributors and number of private repos on gitlab.