I am training two
resnet34 architecture. Both the models have similar dataset with
size = 224 and
bs = 64. I can’t understand why model trained using
data_two takes thrice as long per epoch compared to model trained using
data_one. I am using Google cloud platform with the default configurations (as described here)
Below is batch_stats for the two models.
Nothing you can do anything about. I think its mostly because of GPU kernel overhead. Possible that the test set in data_two is causing that.You can try removing it completely during training. You can also try running the model on data_two and data_one on two different notebooks.
@PalaashAgrawal Thanks. I tried that. Removing the test data does not make any difference. And am running them in two different notebooks. Forgot to mention that these images are actually from different objects - one is cat-dog image set and other one is a set of skin images.
I dont think the fact that one of them is cat vs dog dataset and the other is a skin image dataset would affect training time by a lot, if at all. But again, nothing you can do, really! Try training the model on CPU and see it makes it any faster (by doing
learn.model.cpu() )(sometimes, when there is a lot of GPU overhead, the CPU might work faster). Other than that, I really am out of ideas.