It takes me 5 hours to run one epoch with a Kaggle GPU on the Melanoma Classification challenge

edkahara · June 20, 2020, 8:05am

I’m competing in Kaggle’s Melanoma Classification Challenge. I just used the normal stuff I’ve learned so far (up to lesson 3). Here is my notebook. As you can see, it took 5 hours to run one epoch, with the GPU on. I basically can’t run more than one epoch without surpassing kaggle’s notebook session quota (9 hours). So what did I do wrong or what should I do different?

vferrer · June 21, 2020, 7:15am

I don’t see any problem. You may be training on CPU (you need to activate the GPU on Kaggle). Try,

import torch
torch.cuda.is_available()

It returns True if cuda support is enable.

Another option is that original images are very big. From data description, it may be case: Images are also provided in JPEG and TFRecord format (in the jpeg and tfrecords directories, respectively). Images in TFRecord format have been resized to a uniform 1024x1024. You have 32000 but all data is about 100GB!! So, try to preprocess .jpeg images to 1024x1024 size. Fastai2 has http://dev.fast.ai/vision.utils#resize_images . You could adapt to fastai1

edkahara · June 22, 2020, 7:23am

The GPU was on, that’s why I was asking this question. I got some advice from Kaggle here and here.I’m working on resizing the images.