So at around 1:06:30 of Lesson 7, Jeremy talks about providing the mean and standard deviation for each RGB channel for the set of images. He has pre-calculated these and recommends his students to calculate them for themselves.
Calculating the mean I found easy enough:
x = np.array([np.array(Image.open(f'{PATH}/train/' + fname)) for fname in files])
mean = [0,0,0]
for image in x:
mean += np.mean(image, axis=(0, 1))
mean / len(x) / 255
However, when calculating the standard deviation, I’m struggling. I’m currently using this:
sd = []
for image in x:
sd.append(np.std(image, axis=(0,1)))
np.array(sd).mean(axis=0) / 255
However that gives me values of [0.2022 , 0.19932, 0.20086], which are incorrect.
I’m pretty sure my problem is just in not properly understanding how the standard deviation is properly calculated. I have experimented with ddof=1, though that seems to make little difference.
Any advice or resources that could lead me in the right direction would be greatly appreciated. Thanks!
i am still not able to figure out standard deviation calculation.
Here is what i have done:
PATH = “cifar/”
files = os.listdir(f’{PATH}train/')
x = np.array([np.array(Image.open(f’{PATH}train/’ + fname)) for fname in files])
for image in x:
c = 0
r = image[:,c,c]
r_std = np.std(r)
c+=1
g = image[:,c,c]
g_std = np.std(g)
c+=1
b = image[:,c,c]
b_std = np.std(b)
std +=(np.array([r_std,g_std,b_std]))
Have calculated standard deviation for each channel for each image and summed them up. Does this make sense or am i doing something wrong?