Slow Training

Hey! I have a dataset with around 150k images that I’m training on google colab.
My code is the following:


path = ‘/content/drive/MyDrive/try_images/train’

xray = DataBlock(blocks=(ImageBlock, CategoryBlock),


splitter = partial(GrandparentSplitter(),),

get_y=parent_label, item_tfms=Resize(512))  #512

path = ‘/content/drive/MyDrive/try_images’

categ = os.listdir(os.path.join(path, “train”))


path_anno = str(path + ‘/’ + ‘annotations’)

pre_path_img = []

def path_helper():

for category in categ:

pre_path_img.append(str(path + '/' + category))

path_img = ‘’

for pre in pre_path_img:

path_img = path_img + ', ' + str(pre)

return path_img

path_img = path_helper()

dls = ImageDataLoaders.from_folder(path, train=‘train’, test=‘test’, valid=‘val’, bs=64, num_workers=0, item_tfms=Resize(224))

#dls.batch_size = 1


dls.valid.show_batch(max_n=1, nrows=1)

learn = cnn_learner(dls, resnet50, metrics=error_rate)


The learn.fine_tune(1) for a single epoch is taking more than 13 hours. Does anyone have any idea why? My images are RGB (I thought about making them grayscale, but I’m not sure how), and I’ve resize them to 224.

Thanks to everyone

While 150,000 samples is considerable, that seems like a lot. Is the GPU on?

You can check by going to Runtime → Change runtime type and selecting GPU

Hi Joao!

How big are the images before you resize them to 512 x 512?

I’ve had issues when using big images where resizing the image presented a big bottleneck (with a 2k dataset, about 4 minutes per epoch), and got around it by creating a resized version of the dataset and working directly over that (improving to about 20 secs / epoch), maybe something like that is going on for you too?

Also, you could check if the GPU is being used with nvidia-smi (admittedly, this is easier with colab pro, where you can open a terminal and use watch nvidia-smi and see how it evolves when training.