Google's Tensor Processing Units (TPUs): Beyond GPUs

Google uncovered more details about their Tensor Processing Units (TPUs):
Announcement: https://cloudplatform.googleblog.com/2017/04/quantifying-the-performance-of-the-TPU-our-first-machine-learning-chip.html
Paper: https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view

Key Points:

  • This first generation of TPUs is for inference, not training. Thus it’s optimized for response time over throughput.
  • TPU is 15x to 30x faster than contemporary GPUs and CPUs on inference (the tested GPU K80 was almost as slow as a Haswell CPU at inference).
  • Much more energy efficient
  • Runs compiled Tensorflow code
  • Currently mostly used for MLP and LSTM networks
  • The philosophy of the TPU microarchitecture is to keep the matrix multiply unit busy.
3 Likes

The article is misleading. They don’t use it for training and it is compared to older GPUs.

1 Like