It’s mentioned in the book that white pixels are 0s and black pixels are 255s, and so is the result of numpy array, tensors, dataframe,etc. But when we just display the image using PIL/default photos app, it shows it as a negative image (with black background and white digit).
Here you can see the execution of
im3 gives an image with black background, and the execution of the dataframe statement gives an image with a white background.
This is a bit confusing to me. Would appreciate an explanation.
When MNIST was published, 0 and 255 were intended to represent white and black respectively, that is, black foreground and white background. However, most imaging packages like PIL assume 0 to be black and 255 to be white, hence the disparity you are observing.
Does that clear up your confusion?
Are greyscale images treated differently than colored ones by PIL? I tried inspecting a few PNGs and I think each pixel has RGB values (x, y, z) where x, y and z range from 0 to 255. Where blacks are (0,0,0) and whites are (255, 255, 255). So I assume what my photos app and PIL are showing is indeed the correct representation for humans to consume. But I didn’t get why converting the image to a numpy array makes it the reverse and how three values (x, y, z) became a single value. Is it taking the average of R G and B as gray shades will always have all three the same?
Again, when MNIST was published, the digits were meant to be in black, but the convention followed by PIL is the reverse, so the results are not true to the original MNIST.
Grayscale uses a single value per pixel indicating brightness, but coloured images generally use 3 values per pixel for red, green, and blue (RGB).
In grayscale, black is 0 and white is 255. In RGB, black is again 0 and absolute red, green, and blue are 255. For instance, (255, 0, 0) is pure red because the value for red is 255, but blue and green are 0. As another example, (10, 100, 210) is a little bit of red, a decent amount of green, and quite a bit of blue.
Is the PNG file aware that it’s a grayscale? My assumption is that it’s a dumb data-structure that has a dimension and carries three values(R, G, B) for each of its pixels.
I think it actually knows if it’s grayscale or colored. Notice the dimension of the array.
PNG can handle both coloured and grayscale photos. In this case, MNIST is grayscale, and thus the PNG files contain one value per pixel.