Google's Tensor Processing Units (TPUs): Beyond GPUs

Google uncovered more details about their Tensor Processing Units (TPUs):

Key Points:

  • This first generation of TPUs is for inference, not training. Thus it’s optimized for response time over throughput.
  • TPU is 15x to 30x faster than contemporary GPUs and CPUs on inference (the tested GPU K80 was almost as slow as a Haswell CPU at inference).
  • Much more energy efficient
  • Runs compiled Tensorflow code
  • Currently mostly used for MLP and LSTM networks
  • The philosophy of the TPU microarchitecture is to keep the matrix multiply unit busy.

The article is misleading. They don’t use it for training and it is compared to older GPUs.

