in the fastai book/course chapter <MNIST_Basics> there is a conversion of a tensor rank-3 to rank-2 (list of matrices to list of vectors)
To be more precise:
From —> shape [ num_imgs , pixel_x, pixel_y]
to —> shape [num_imgs, pixel_x*pixel_y]
Is there any particular reason to do this? Sure the data can be represented using a rank 2 tensor but do we gain any performance boost or any benefit of doing so or is it just a convention that practitioners use?
It’s just to make the model simpler (pretty sure this is before they introduce convolutions). Most (if not all) real-world image processing models are going to deal with it as a rank 3 tensor.
Yes, at this point convolutions have not been tackled, yet.
Could you maybe elaborate a bit further, please?
Why you would deal with it as a rank 3 tensor. I mean having a single vector containing all the pixel seems more convenient than dealing with a vector of vectors…
Treating the image as a rank 3 tensor allows you to better take advantage of the spatial proximity of the pixels. When you reshape to a rank 2 tensor you lose the representation that some pixels are below other pixels.