Sorry if this is a dumb question, but I’m still getting started with my AI journey.
I just finished Lesson 3 / Chapter 4 of the book: MNIST Basics, which was a doozy. I was wondering though, when we initially load our MNIST images into tensors of floats, why do we divide them by 255?
I understanding dividing by 255 restricts the range of the resulting floating point values between 0 and 1, by why? The book just says: “Generally when images are floats, the pixel values are expected to be between 0 and 1, so we will also divide by 255 here:”
If I don’t divide by 255 it screws up training later on, so it must be important, can someone elaborate on it?
I might be wrong, but I think it has to do with normalization. When calculating gradients, some gradients can be very big and overpower other values. Also, numbers are not accurate when they are too big on computers. That’s why we start with numbers between 0 and 1.
Let me know if I am wrong.
Traditional gray(grey) scale pictures use 8 bits to represent black to white. Red Green Blue uses 24 bits and RGB with alpha transparency use 32 bits. The 8 bits represents an whole number between 0 and 255. We call this an integer and we perform integer mathematics on integers so 1+2 = 3 but 3/2 = 1 not 1.5. Machine Learning works with floating point number so 1.1 * 1.1 = 1.21 and 1.0/9.0 = 0.111111111111111111111… So it is necessary to convert the 8 bits into a floating point number 0.0 to 255.0 but computers are most accurate at storing small numbers near 1. Therefore dividing by 255.0 makes the calculation sit in a narrow range. Often called normalization. It is a bit like Goldilocks neither too small nor too large makes the mathematics better.
A digital information is stored in a unit called as Byte (1 Byte = 8 Bits). Now each pixel can have values either 0 and 1 that makes it 2^8 = 256 or information could be stored between 0 and 255 values. Hence we divide the bits by 255 to convert into float.
Hope this helps.