Dog Breed Identification challenge

Lets apply our skills on some other datasets. One of the image recognition competition that is currently running on Kaggle is Dog Breed Identification which we can use as our sandbox. It should be good competition to start with as these classes belong to imagenet categories so building a CNN from scratch should not be required.

Its pitty but it looks like this competitions even though it is a playground has some rules

Privately sharing code or data outside of teams is not permitted. It’s okay to share code if made available to all participants on the forums.

So I am not sure how to collaborate on the forum and not to violate this rule? Maybe @jeremy can help us with advise on this? I thought we can form a team of all fastai students and after this we can share everything on the forum :slight_smile: Or maybe Kaggle means fastai forums :wink:

One more thing to consider is

Submission Limits
You may submit a maximum of 5 entries per day.

So we have two choices:

  1. to create more than one team
  2. to rely more on our internal validation

What do you think about it?

9 Likes

One idea is to work on the problem individually or in small teams for 3-4 weeks and later we can merge together and compare the predictions to get a good stacked model out of uncorrelated predictions from different teams.

3 Likes

Is there a limit to team sizes? That would be my only hesitation, otherwise that would be very interesting!

No, there is not

Team Limits
There is no maximum team size

I think that would be pretty cool. We could still break into smaller groups as suggested by @ar_ai.

Yeah. I guess breaking into smaller groups would be nice.

I agree it makes the most sense to start in smaller teams so we can maximize our submission counts per day leading up to the merger deadline at which point we could elect to join forces. As @ar_ai mentioned we could improve our scores a lot just by ensembling/averaging all of our predictions together. Just keep in mind we wouldn’t be able to share our code outside teams.

Right, we can start a private forum chat.

I didn’t think about the max submissions per team per day. That might change things. Maybe start in smaller groups at least then.

After the merger, total submission can’t go above number of days competition has been running multiplied by 5. So, we can’t form big teams if individual participants already submitted a lot. We have to keep that in mind also. We may have to depend on internal validation a lot.

2 Likes

Guys where do you save your codebase for Kaggle competition? Otherwise keeping them in git is equivalent to sharing them …

Are you the police? :slight_smile:

6 Likes

I either keep in on my machine locally or use a private github repo so it can be shared amongst my team-members and release it publicly after the comp ends.

Haha…

Private githup repos are paid :disappointed_relieved:

Just noticed that bitbucket provides free private repos.

1 Like

Often in teams one team member can host the repo and invite other contributors so that way not everyone needs to have a paid github plan. Or like you said bitbucket is another good option for free private repos.

Hi. How do i get the dog breed challenge dataset from kaggle to aws fastai/data folder? Could anyone give the steps or things needed to do it.

1 Like

There are a couple of diff kaggle command line tools you can use to download the datasets to aws etc. See below are two options that I know of that work well.


4 Likes

Looking into Kaggle-cli! Thanks

Hey! I’m up. If somebody decide to build a team count me in.

You can also use gitlab.com for hosting private repos. There is absolutely no limits on number of contributors and number of private repos on gitlab.

1 Like