Image normalization in PyTorch


I’m working on an image classification problem. I’m working in PyTorch and I need to normalize the images so that they have a mean 0.0 and a standard deviation of 1.0 (reference: ). I implemented the mean subtraction and std division, but I stumbled as the network’s behaviour is strange and I’m not sure with my implementation.

Now, this is my transforms settings:

transform = transforms.Compose([

which means that the output images are in the range <0,1> because they are PIL images (it doesn’t work without the ToPILImage() transform). They have a mean of 0.75 and std of 0.33.

When I calculated the per color channel mean, I got [ 0.76487684, 0.75205952, 0.74630833] (RGB). Same for std [ 0.27936298, 0.27850413, 0.28387958]. I normalize using the transforms.Normalize(mean,std) (basically (x-mean)/std) ).

My code looks like this (for the sake of simplication, assume I only have 32 images):

dset = CustomDataset('images', transform=transform) #PyTorch Dataset object
dataloader =, batch_size=32, shuffle=False, num_workers=4)
images, labels = iter(dataloader).next()
# images.shape = ( 32, 3, 80, 80)
numpy_images = images.numpy()

per_image_mean = np.mean(numpy_images, axis=(2,3)) #Shape (32,3)
per_image_std = np.std(numpy_images, axis=(2,3)) #Shape (32,3)

pop_channel_mean = np.mean(per_image_mean, axis=0) # Shape (3,)
pop_channel_std = np.mean(per_image_std, axis=0) # Shape (3,)

Now, is it “wise” to normalize images to have zero mean and std of 1 ? Or do you normalize images to be <-1, 1>. Lastly, is my implementation correct? I’m not sure with the std.

Thanks in advance.

I figured I should calculated the std this way:

pop_channel_std = np.std(numpy_images, axis=(0, 2, 3)) #Shape (3,)

Although, I have too many images and have to calculate the mean and std cummulatively for batches before calculating the population values.


Have you considered using sklearn.preprocessing.StandardScaler? It has a partial_fit(X[, y]) method that you could call on each batch. You could then get the _mean and _var fields to use them as parameters for your PyTorch transform (_mean, sqrt(_var).

1 Like

Thanks, I’ll try that, it looks promising!

If all you want to Normalize your inputs, you might want to add Normalize after you convert it to Tensor in your compose transforms list like -

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

See API Docs -


1 Like

I know that and I did it that way. That’s not the problem. I’m trying to calculate the mean and std in the first place. Thanks anyway.


Have you solved the problem? I meet the similar question, but don’t know how to do with it…Could you please show how did you solve it finally?

Thank you!


yes. You need to calculate the mean and std in advance. I did it the following way:

transform = transforms.Compose([

dataloader =*torch_dataset*, batch_size=4096, shuffle=False, num_workers=4)

pop_mean = []
pop_std0 = []
pop_std1 = []
for i, data in enumerate(dataloader, 0):
    # shape (batch_size, 3, height, width)
    numpy_image = data['image'].numpy()
    # shape (3,)
    batch_mean = np.mean(numpy_image, axis=(0,2,3))
    batch_std0 = np.std(numpy_image, axis=(0,2,3))
    batch_std1 = np.std(numpy_image, axis=(0,2,3), ddof=1)

# shape (num_iterations, 3) -> (mean across 0th axis) -> shape (3,)
pop_mean = np.array(pop_mean).mean(axis=0)
pop_std0 = np.array(pop_std0).mean(axis=0)
pop_std1 = np.array(pop_std1).mean(axis=0)

Note that in theory, the standard deviation of the whole dataset is different than if you calculate the std per minibatch and then calculate the final std as a mean of minibatches’ stds (as I did, try to have the batch size as large as possible, I used 4096). The problem is with a huge dataset like mine (>12 mil images), you can never calculate the standard deviation across the whole dataset due to memory constraints. If your dataset is of reasonable size and you can load the whole thing into memory, then you can calculate both mean and std of the whole thing. But in practise, it shouldn’t be a problem if you use the mean of standard deviations of all the minibatches.

Also note, that it’s calculated on the CPU and not the GPU, so if you run on cloud, you can do it on some cheap instance and you don’t have to use a GPU instance.

Once you have the mean and std, just add the following line to the transforms.Compose list:

transform = transforms.Compose([
    transforms.Normalize(mean=*your_calculated_mean*, std=*your_calculated_std*)

Hope that helps.


Thanks for your help,I will try it again.

@danielhavir, @Will1994 - Your code may need to be adjusted as -

pop_means =  [x/255  for x in pop_means]
pop_stds = [x/255 for x in pop_stds]

transforms.ToTensor() converts the values from 0-255 into range 0-1. So the mean and std normalization after that needs to also be adjusted for that.

For example -

You can calculate mean/std for all data even using batches, the description you can find here:

I’ve implemented 2 methods of calculating mean/std (using 1 batch for all data, and using batches with size 100) for comparing results, they are almost equal (difference in values only after 4-5th decimal number), normalize for mnist and cifar10 using transforms.Normalize, and check the mean and std again after normalization.
You can see the code here:


Hey guys, thanks for sharing!
I know it’s been a while, but I just got the same problem…And after googling a bit and reading the post, I got confused about the axes that needs to be specified while calculating the mean/std of a batch of RGB images.

Could you please elaborate a little bit why you used axis=(0,2,3)?

Some ppl used axis=(0,1,2) as in this post:

I understand we are doing normalization across width and height, but just don’t know how.

because your input is batch_size3img_size*img_size, and you are referring to norm to batch_size, img_size, img_size?

And what does this two std mean? sorry if it’s naive, but i am new to this
batch_std0 = np.std(numpy_image, axis=(0,2,3))
batch_std1 = np.std(numpy_image, axis=(0,2,3), ddof=1)

That represent which dimension to be operated,
just print your numpy_image.size() then you will know why.

Hello guys, I have a beginner question: Should we apply the mean and std computed from the training set to the validation and test sets or should we compute the mean and std for each of these sets separately and have corresponding transform objects for them?

Thanks in advance.

1 Like

Tensors have the format (Batch, Channel, Height, Width) while PIL Images have (Batch, height, width, channel)

Cool… I was wondering why my mean and std is getting different results each time I do the calculation.
It is in fact caused by calculating the two values from different samples from different batches.

That shd make sense now. thanks!