my understanding is that, the 32x32 images would have been transformed images from some other higher size, any more complicated augmentations would lead to loss of image context.
I don’t think it is about location of objects in the image, since this is not a object detection or localisation problem.
If u have good 32x32 images, may be we can apply more transforms. Its about what we might loose by applying transforms. My understanding completely stands on the problem not being a detection/localization problem, and hence we can apply rotation/flips on good images of 32x32 size.