How Do I Use Kaggle Data?

richardreeze · August 1, 2018, 12:13am

I just want to preface this by saying I’m a complete beginner.
I finished Part 1 Lecture 2 and want to join a few Kaggle competitions for practice.

I found these two that compare different images:

Additionally, I found this really cool one that I want to join

But I don’t understand how to use their data to train my model.

In the lecture, Jeremy’s “train” and “valid” directories have subdirectories for each class you want to recognize (in his example, “cats” and “dogs” directories).

But in these competitions, all the images are in the same directory. The most I get is a .csv file which tells me the types (instead of different directories).

How do I train my model if this is the case? Or am I just looking at the wrong competitions? Please help

KarlH · August 1, 2018, 12:44am

Keep working through the course. The lesson 2 notebook (planet dataset) and the lesson 1-breeds notebook (dog breeds) show how to train a model using a csv. IIRC you’ll see those in the next lecture video.

richardreeze · August 1, 2018, 1:35am

Thanks, Karl!

pulkits · August 1, 2018, 7:05am

Hi all,

I was trying to use the resnet pretrained model for image classification. I followed the following steps:

learn = ConvLearner.pretrained(arch, data)
learn.fit(0.01, 5)

Each epoch here is taking a lot of time (around 3-4 minutes per epoch). Is there a way in which this time can be reduced so that the accuracy will also not be affected? Any help on this will be appreciated.

Regards,
Pulkit

keratin · August 3, 2018, 6:28pm

Hey @pulkits.

3-4 minutes is actually pretty good for an epoch to run. A resnet is a big model with a lot of parameters which are all adjusted during the training. In later classes, these will take a lot longer so you need to get used to that.
Now, to answer your question, possible ways I see to reduce this time could be to play around with your hyperparameters (batch size, mostly) or to get a faster GPU to train on. Another method could be to fiddle with half precision floating point but I feel that is too advanced at this point.

Hope this helps!