Example of data.normalize for fastAI v1

Hi! I wanna normalize my dataset before training using data.normalize (https://docs.fast.ai/vision.data.html#normalize-1) but I couldn’t find examples where the parameters to the function are computed manually (instead of using imagenet_stats and so on). My dataset is not in RGB (it’s 4-channels) so I guess using that would give weird results.

Do you know of an code example where that is computed manually?

Thanks!

3 Likes

You can call normalize with no args to have it use a batch of your data to calculate stats. Those stats are stored in Learner.stats.

You’ll need 1.0.14 for this to work, which currently requires using git master branch. (Will be released before Monday’s class).

10 Likes

Thanks Jeremy, I built fast.ai from source and it worked.

1 Like

Is it possible to use imageNet transfer learning (for a 3 channel RGB dataset) that has a different normalization stats? If yes, what stats we should use for training and testing?

And what if the dataset is not in RGB (4-channels or 2 channels)?

1 Like

Haider,

You can use .normalize(imagenet_stats) but I think as as long as you match your training and testing normalization, that will be okay. Haven’t tested that and still learning the theory.

Jeremy was kind to reply my question in-class ( got 16 votes ). And I am copying the answer from @hiromi 's lesson 3 notes

Question : Some satellite images have 4 channels. How can we deal with data that has 4 channels or 2 channels when using pre-trained models? [1:59:09]

I think that’s something that we’re going to try and incorporate into fast AI. So hopefully, by the time you watch this video, there’ll be easier ways to do this. But the basic idea is a pre-trained ImageNet model expects a red green and blue pixels. So if you’ve only got two channels, there’s a few things you can do but basically you’ll want to create a third channel. You can create the third channel as either being all zeros, or it could be the average of the other two channels. So you can just use you know normal PyTorch arithmetic to create that third channel. You could either do that ahead of time in a little loop and save your three channel versions, or you could create a custom dataset class that does that on demand.

For 4 channel, you probably don’t want to get rid of the 4th channel. So instead, what you’d have to do is to actually modify the model itself. So to know how to do that, we’ll only know how to do in a couple more lessons time. But basically the idea is that the initial weight matrix (weight matrix is really the wrong term, they’re not weight matrices; their weight tensors so they can have more than just two dimensions), so that initial weight tensor in the neural net, one of its axes is going to have three slices in it. So you would just have to change that to add an extra slice, which I would generally just initialize to zero or to some random numbers. So that’s the short version. But really to understand exactly what I meant by that, we’re going to need a couple more lessons to get there.

5 Likes

Alternatively, you can copy the weights from one of the existing channels to the new 4th channel. I have not exhaustively tested which channel would or should be better, it will depend on your images. So far I have not seen a significant difference between copying weights any of the three existing channels or setting channel 4 to zeros or random, but copying usually gives a very small improvement all else being equal.

Here is a snippet of how to do it in pytorch using resnet as an example:

class Resnet4D(nn.Module):
    def __init__(self, pretrained=True, num_classes=num_classes):
        super().__init__()

        net = torchvision.models.resnet34
        w = net .conv1.weight
        self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.conv1.weight = nn.Parameter(torch.cat((w,w[:,:1,:,:]),dim=1))
        
        self.bn1 = net .bn1
        self.relu = nn.ReLU(inplace=True) 
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = net .layer1
        self.layer2 = net .layer2
        self.layer3 = net .layer3
        self.layer4 = net .layer4
        
        self.avgpool = net .avgpool
        self.fc = nn.Linear(512, num_classes)
6 Likes

Hey Jeremy as you said normalize function(when no args are passed) will return the mean and std. on a batch of data, and since these stats are based on small batch definitely these stats will not make the standard deviation = 1 and mean = 0 for the training data. So is it okay to do so as it will definitely scale down the training data. And If it is okay then rather than using the stats from a batch of our data why not directly use imagenet_stats always for 3 channel inputs.
Also while looking at the imagenet_stats or the stats form Learn.stats it returns two tensors with 3 elements each. There is nothing in the docs that tell what does those values infer. Can you please tell me about them (basically which is mean and which is std.) Is the first tensor the mean for the 3 channels and 2nd tensor the std. for the 3 channels

Yup, that’s the correct order (and if it’s lacking in the docs, it would be great to contribute and add it :wink: )

As for the stats computed on one batch, it’s not as accurate as the stats computed on the whole dataset, but we found it was enough to get the same training results (as long as you have a standard batch size of 64).

1 Like

Thanks for the reply. I’ll do it.

1 Like

When the image has more than 3 channels, we can just save the image as a RGB image losing the 4th channel. The code is here with an example of an image from Lesson 1 that has 4 channels:
img = PIL.Image.open(f’{PATH}/images/Egyptian_Mau_186.jpg’)
imgarr = np.array(img)
if len(imgarr.shape)==2:
img_channels=1
else:
img_channels=imgarr.shape[2]
if img_channels > 3:
img = img.convert(“RGB”)
img.save(f’{PATH}/images/Egyptian_Mau_186.jpg’)

When the image has only 2 channels, also the same command works
img.convert (‘RGB’)
just ensure that it is a PIL image opened with PIL.Image.open or convert to PIL image later

What is “imagenet_stats”?.I found ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]).From where it derived and what does it mean?

It’s the mean and standard deviation for each channel (R, G and B) on the images of the training set of ImageNet. Pretrained models received their inputs normalized this way, so when you’re using one, you should normalize this way too.

1 Like

Hi! ImageNet is the collection of 14 million images used to pre-train these core architectures. If you think your production photos will resemble the ImageNet population, it may be better to use imagenet_stats than the stats from your (presumably smaller and noisier) training set. The variable imagenet_stats has those, and .normalize(imagenet_stats) returns (I believe) the normalized mean and standard deviation for R, G, and B channels. Hm… possibly available directly as imagenet_norm.

2 Likes

I use .normalize (imagenet_stats) but the output in the train data only shows one category while the valid data shows two different categories. how to handle it ?


even though I have 2 different classes