I have a question about notebook fast.ai v3 lesson 1. I wanted to see what normalize() does to images so I displayed the images after normalize call and without normalize call. I was expecting to see the same picture repeated with some image processing done to first batch. Instead I was given two different image sets. Is normalize() suppose to change the order of the data or is there something else I’m doing wrong?
I have added code and results below:
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs, num_workers=0).normalize(imagenet_stats)
data2 = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs, num_workers=0);
Did you set the seed? ex. np.random.seed(2)?
ImageDataBunch makes training set and validation set. Validation set is set to 0.2 by default.
Training set and validation set are chosen randomly.
So, if you did not set the seed, it is reasonable that your data and data2 generated different training set batches.
I am not sure what you mean by “sharing the same counter”.
But, I think you need to do:
I ran the lesson 1 script. I do not know if fastai library implicitly calls numpy library, but np.random.seed() is not working properly without calling the library explicitly.
Now, the two DataBunch with and without normalization should show you similar output from show_batch(). They might be little different because of normalization, which is just taking each data, subtract the average of the entire data, and divide that by the standard deviation.
What happens there is basically an invocation of one_batch(), which is in the same file. This appears to be an iterator and returns the next batch of data, so never the same. I assume that the entire dataset is in random order, but cannot be sure.
I’m not 100% sure but from my experience data and data2 are the same thing, for example if you build two databunches from the same labeledlist and name it data1 and data2, the data2(most recently executed line) will share the parameters with the data1 and becomes an exact copy of data2.
Here is an example:
create a databunch with images of size 200 and name it data200