ImageDataBunch.normalize does not work

hxyike · April 12, 2019, 7:31am

I created image_data with ImageDataBunch.from_folder(…).normalize(), but when I looked at the image_data.train_ds[0][0].data, I found that the normalize() function does not work. Specifically,
image_data_0 = ImageDataBunch.from_folder(…)
image_data_1 = ImageDataBunch.from_folder(…).normalize(imagenet_state)
image_data_2 = ImageDataBunch.from_folder(…).normalize(([0.5,0.5,0.5],[0.5,0.5,0.5]))

and they all created the same data, that is image_data_0(1, 2).train_ds.data are completely same. WHY? Does normalize function not work? fastai.version is 1.0.51.

sgugger · April 12, 2019, 1:24pm

The normalization is done at the batch level (it’s quicker to do it on all the images at once). It’s normal your dataset stays the same.

hxyike · April 13, 2019, 2:58am

Thanks so much.

louis · July 9, 2020, 5:15pm

Hi, apologises for digging this out.
I am having a similar problem though I do ask use .one_batch() to pull out the array.

Basically my databunch looks like that

data =  (ObjectItemList.from_folder(PATH, include = ['train'])
                   .split_by_valid_func(get_valid)
                   .label_from_func(labelling_func)
                   .transform(get_transforms(do_flip=False, max_rotate=None),size = 224)
                   .databunch(bs = 64,collate_fn=bb_pad_collate)
                   .normalize()
    )

I pull the data and verify the mean over the samples like that

x,y = data.one_batch()
x.mean((0,2,3))

Which does not yield a (0,0,0) value. Besides if I remove the .normalize() in the databunch building block, it yields the same result.
What am I missing ?

---------------------------[SOLVED] -------------------------

So several things:

.one_batch() has the keyword arg denorm set to True per default. Setting it to False yields finally the expected result. It seems a bit missleading (at least in my case) but I guess it serves some purpose.
Initially I thought that the learner used the .one_batch() from the databunch. Actually it directly calls the train_dl iterator of the databunch, which yields the expected result also. You can reproduce that like this:

x,y = next(iter(data.train_dl)); x.mean((0,2,3))