Is a tensor just a complex, multi-layered image?

I’m trying to wrap my head around what a tensor is based on Lesson 3 and the beginning of Lesson 4.

In reference to his Excel demonstration at the beginning of the Lesson 4 video, why did each of the layer 2 filters have two matrices? Why did each of the image representations from the previous layer require a separate 3x3 matrix in each of the layer 2 filters instead of just operating against a single 3x3 matrix?

I’m thinking the answer lies somehow in thinking of a filter as just a layered image, but I’m not sure if I’m on the right track or not.

Thanks -wg