Image normalization in PyTorch


(Daniel Havir) #1

Hi,

I’m working on an image classification problem. I’m working in PyTorch and I need to normalize the images so that they have a mean 0.0 and a standard deviation of 1.0 (reference: https://cs231n.github.io/neural-networks-2/#datapre ). I implemented the mean subtraction and std division, but I stumbled as the network’s behaviour is strange and I’m not sure with my implementation.

Now, this is my transforms settings:

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ToTensor()
])

which means that the output images are in the range <0,1> because they are PIL images (it doesn’t work without the ToPILImage() transform). They have a mean of 0.75 and std of 0.33.

When I calculated the per color channel mean, I got [ 0.76487684, 0.75205952, 0.74630833] (RGB). Same for std [ 0.27936298, 0.27850413, 0.28387958]. I normalize using the transforms.Normalize(mean,std) (basically (x-mean)/std) ).

My code looks like this (for the sake of simplication, assume I only have 32 images):

dset = CustomDataset('images', transform=transform) #PyTorch Dataset object
dataloader = torch.utils.data.DataLoader(dset, batch_size=32, shuffle=False, num_workers=4)
images, labels = iter(dataloader).next()
# images.shape = ( 32, 3, 80, 80)
numpy_images = images.numpy()

per_image_mean = np.mean(numpy_images, axis=(2,3)) #Shape (32,3)
per_image_std = np.std(numpy_images, axis=(2,3)) #Shape (32,3)

pop_channel_mean = np.mean(per_image_mean, axis=0) # Shape (3,)
pop_channel_std = np.mean(per_image_std, axis=0) # Shape (3,)

Now, is it “wise” to normalize images to have zero mean and std of 1 ? Or do you normalize images to be <-1, 1>. Lastly, is my implementation correct? I’m not sure with the std.

Thanks in advance.

edit:
I figured I should calculated the std this way:

pop_channel_std = np.std(numpy_images, axis=(0, 2, 3)) #Shape (3,)

Although, I have too many images and have to calculate the mean and std cummulatively for batches before calculating the population values.


CIFAR 10 mean and standard deviation per channel for inputs
(Rodrigo) #2

Have you considered using sklearn.preprocessing.StandardScaler? It has a partial_fit(X[, y]) method that you could call on each batch. You could then get the _mean and _var fields to use them as parameters for your PyTorch transform (_mean, sqrt(_var).


(Daniel Havir) #3

Thanks, I’ll try that, it looks promising!


(Ramesh Sampath) #4

If all you want to Normalize your inputs, you might want to add Normalize after you convert it to Tensor in your compose transforms list like -

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

See API Docs - http://pytorch.org/docs/0.2.0/torchvision/transforms.html#torchvision.transforms.Normalize

Example: https://discuss.pytorch.org/t/normalization-in-the-mnist-example/457/5


(Daniel Havir) #5

I know that and I did it that way. That’s not the problem. I’m trying to calculate the mean and std in the first place. Thanks anyway.


(Will) #6

Have you solved the problem? I meet the similar question, but don’t know how to do with it…Could you please show how did you solve it finally?

Thank you!


(Daniel Havir) #7

Hi,

yes. You need to calculate the mean and std in advance. I did it the following way:

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ToTensor()
])

dataloader = torch.utils.data.DataLoader(*torch_dataset*, batch_size=4096, shuffle=False, num_workers=4)

pop_mean = []
pop_std0 = []
pop_std1 = []
for i, data in enumerate(dataloader, 0):
    # shape (batch_size, 3, height, width)
    numpy_image = data['image'].numpy()
    
    # shape (3,)
    batch_mean = np.mean(numpy_image, axis=(0,2,3))
    batch_std0 = np.std(numpy_image, axis=(0,2,3))
    batch_std1 = np.std(numpy_image, axis=(0,2,3), ddof=1)
    
    pop_mean.append(batch_mean)
    pop_std0.append(batch_std0)
    pop_std1.append(batch_std1)

# shape (num_iterations, 3) -> (mean across 0th axis) -> shape (3,)
pop_mean = np.array(pop_mean).mean(axis=0)
pop_std0 = np.array(pop_std0).mean(axis=0)
pop_std1 = np.array(pop_std1).mean(axis=0)

Note that in theory, the standard deviation of the whole dataset is different than if you calculate the std per minibatch and then calculate the final std as a mean of minibatches’ stds (as I did, try to have the batch size as large as possible, I used 4096). The problem is with a huge dataset like mine (>12 mil images), you can never calculate the standard deviation across the whole dataset due to memory constraints. If your dataset is of reasonable size and you can load the whole thing into memory, then you can calculate both mean and std of the whole thing. But in practise, it shouldn’t be a problem if you use the mean of standard deviations of all the minibatches.

Also note, that it’s calculated on the CPU and not the GPU, so if you run on cloud, you can do it on some cheap instance and you don’t have to use a GPU instance.

Once you have the mean and std, just add the following line to the transforms.Compose list:

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ToTensor(),
    transforms.Normalize(mean=*your_calculated_mean*, std=*your_calculated_std*)
])

Hope that helps.


(Will) #8

Thanks for your help,I will try it again.


(Ramesh Sampath) #9

@danielhavir, @Will1994 - Your code may need to be adjusted as -

pop_means =  [x/255  for x in pop_means]
pop_stds = [x/255 for x in pop_stds]

transforms.ToTensor() converts the values from 0-255 into range 0-1. So the mean and std normalization after that needs to also be adjusted for that.

For example -