Google uncovered more details about their Tensor Processing Units (TPUs):
Announcement: https://cloudplatform.googleblog.com/2017/04/quantifying-the-performance-of-the-TPU-our-first-machine-learning-chip.html
Paper: https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view
Key Points:
- This first generation of TPUs is for inference, not training. Thus it’s optimized for response time over throughput.
- TPU is 15x to 30x faster than contemporary GPUs and CPUs on inference (the tested GPU K80 was almost as slow as a Haswell CPU at inference).
- Much more energy efficient
- Runs compiled Tensorflow code
- Currently mostly used for MLP and LSTM networks
- The philosophy of the TPU microarchitecture is to keep the matrix multiply unit busy.