How to Think About TPUs
Part 2 of ( | ) This section is all about how TPUs work, how they're networked together to enable multi-chip training and inference, and how this affects the performance of our favorite algorithms. There's even some good stuff for GPU users too! What Is a TPU? A TPU is basically a compute core that specializes in matrix multiplication (called a TensorCore) attached to a stack of fast memory (called high-bandwidth memory or HBM) [1]. Here’s a diagram: Figure: the basic components of a TPU chip. The TensorCore is the gray left-hand box,…