Google's Tensor Processing Units (TPUs): Beyond GPUs

slavivanov · April 6, 2017, 4:59pm

Key Points:

This first generation of TPUs is for inference, not training. Thus it’s optimized for response time over throughput.
TPU is 15x to 30x faster than contemporary GPUs and CPUs on inference (the tested GPU K80 was almost as slow as a Haswell CPU at inference).
Much more energy efficient
Runs compiled Tensorflow code
Currently mostly used for MLP and LSTM networks
The philosophy of the TPU microarchitecture is to keep the matrix multiply unit busy.

dradientgescent · April 7, 2017, 10:16am

The article is misleading. They don’t use it for training and it is compared to older GPUs.