Google uncovered more details about their Tensor Processing Units (TPUs):
- This first generation of TPUs is for inference, not training. Thus it’s optimized for response time over throughput.
- TPU is 15x to 30x faster than contemporary GPUs and CPUs on inference (the tested GPU K80 was almost as slow as a Haswell CPU at inference).
- Much more energy efficient
- Runs compiled Tensorflow code
- Currently mostly used for MLP and LSTM networks
- The philosophy of the TPU microarchitecture is to keep the matrix multiply unit busy.