Hi,
I’m working on an image classification problem. I’m working in PyTorch and I need to normalize the images so that they have a mean 0.0 and a standard deviation of 1.0 (reference: https://cs231n.github.io/neural-networks-2/#datapre ). I implemented the mean subtraction and std division, but I stumbled as the network’s behaviour is strange and I’m not sure with my implementation.
Now, this is my transforms settings:
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor()
])
which means that the output images are in the range <0,1> because they are PIL images (it doesn’t work without the ToPILImage() transform). They have a mean of 0.75 and std of 0.33.
When I calculated the per color channel mean, I got [ 0.76487684, 0.75205952, 0.74630833] (RGB). Same for std [ 0.27936298, 0.27850413, 0.28387958]. I normalize using the transforms.Normalize(mean,std)
(basically (x-mean)/std) ).
My code looks like this (for the sake of simplication, assume I only have 32 images):
dset = CustomDataset('images', transform=transform) #PyTorch Dataset object
dataloader = torch.utils.data.DataLoader(dset, batch_size=32, shuffle=False, num_workers=4)
images, labels = iter(dataloader).next()
# images.shape = ( 32, 3, 80, 80)
numpy_images = images.numpy()
per_image_mean = np.mean(numpy_images, axis=(2,3)) #Shape (32,3)
per_image_std = np.std(numpy_images, axis=(2,3)) #Shape (32,3)
pop_channel_mean = np.mean(per_image_mean, axis=0) # Shape (3,)
pop_channel_std = np.mean(per_image_std, axis=0) # Shape (3,)
Now, is it “wise” to normalize images to have zero mean and std of 1 ? Or do you normalize images to be <-1, 1>. Lastly, is my implementation correct? I’m not sure with the std
.
Thanks in advance.
edit:
I figured I should calculated the std this way:
pop_channel_std = np.std(numpy_images, axis=(0, 2, 3)) #Shape (3,)
Although, I have too many images and have to calculate the mean and std cummulatively for batches before calculating the population values.