GPU vs TPU

joresh · February 1, 2019, 8:01pm

Wanted to share an observation.

I am using Google Colab and am running an image classifier using Jeremy’s Lessons 1 and 2. I have about 150 images and set the size to 400 during databunch creation.

On GPU the ‘fit’ with 2 epochs, runs in 56 secs and got an error rate of 0.00

On TPU the ‘fit’ with same images and 2 epochs runs in 6 min 6 secs, error rate 0.027

I thought TPUs were faster than GPUs, as claimed by Google, but it does not seem to be so. Is it because fastai is not optimized for TPUs? Or am I missing something?

Also, I got a 0.0 error rate with the GPU. Seems too good to be true. Does it indicate overfilling?

Esteban · February 2, 2019, 1:28am

If all you did was select the runtime option of “TPU” you probably are not actually running on a TPU. There is some setup code involved (https://www.dlology.com/blog/how-to-train-keras-model-x20-times-faster-with-tpu-for-free/) to get things working. I think you default to running on a “CPU” if the setup code isn’t added, which might explain you slow performance. Also I don’t believe Pytorch is officially support on the TPU yet (I think they are working on it though).

joresh · February 2, 2019, 9:41pm

Thank! Will try that out

amqdn · February 2, 2019, 10:39pm

This is not explicitly related to your GPU vs. TPU question, but I would not directly compare those error_rates as an indicator of computational performance when the train_loss is so much higher than the valid_loss.

My limited experience with Image Classification using fastai tells me that, when the train_loss is that much higher than the valid_loss, the model is underfitting – either the model needs to run for more epochs and/or the learning rate is too low.

And even besides that (someone correct me if I’m wrong), error_rate is not an indicator of the hardware’s computational performance but the model’s predictive performance.

joresh · February 3, 2019, 12:03am

Thanks Andrew but I was not relating the error rate to performance. My question was whether error rate of zero indicates overfitting of some other issue. Whenever something is perfect it arouses my suspicion.

Your insights on the other numbers is very helpful too. I have to test the prediction with a large volume of new data to really know how good it is

amqdn · February 3, 2019, 8:34am

Ah, I see. That makes sense.

Sounds good.