How is VGG16 Mean calculated


(satish) #1

I have watched first video in part 1 yesterday and working on the Dogs vs Cats . I like to understand Why and How this mean is calculated for the image . The below link shows the code block where it is done.


(Pietz) #2
  • How: Sum up the intensities of the training set for each color channel separately and divide by the total number of pixels. You can do that manually in python with x_train.mean(axis=(0,1,2)).
  • Why: Once we know the mean, we can subtract it from all pixel values so the intensities are centered at 0. this helps to increase training speed and accuracy.

(satish) #3

Thanks @pietz . I believe Mean mentioned in this problem is related to dogs vs cats dataset. So do we need to change it when doing state farm problem ?

Second in the same code block the image is converted fr RGB to BGR . Any particular reason for the same .


(Pietz) #4

mean centering each channels is a little more precise than just taking the mean across all the channels. that said, ive never come across a problem where centering each channel has lead to better results. but when you do transfer learning you want to adapt precisely to the normalization method the architecture was trained on. no questions asked.

concerning BGR, can you check if the code uses any opencv functions after the conversion? if i’m not mistaken opencv natively works with BGR instead of RGB. other than that i dont know.


(satish) #5

Thanks @pietz

I verified the code and it is not using openCV . I too read a blog post which explains the same . May be there is a explanation for that in the next videos
Thanks


(Pietz) #6

The people who trained the original network used caffe with opencv, so we have to convert it to BGR a well :slight_smile:

“Apparently it’s because Caffe uses OpenCV which uses BGR. We could swap it in the weights, but I think that could confuse people who looked at the Caffe Model Zoo page.”


(satish) #7

So in the Case if I use Keras Pre-Built Model from 2.6 , I believe that Pre-Processing is not required . Let me know your thoughts


(Eugene Ware) #8

I tried to calculate this myself too. This is how I did it (i did with the cats dogs redux data, which is obviously not the original vgg16 training set. But I’ve read that when building your own model from scratch, you should do this over your training set (and exclude your validation set).

Anyway, couldn’t find any good code examples, so I thought I’d post this here. Probably a faster way to do this (this should be much more parellisable as only one core on the AWS box was pegged. But it only takes about 3 minutes to run:

batches = gen.flow_from_directory(path + 'train', target_size=(224,224), 
                                  class_mode='categorical', shuffle=False, batch_size=128)
sum = np.array([0.0, 0.0, 0.0]);
count = 0

for imgs, labels in batches:
    sum += np.sum(imgs, axis=(0, 2, 3))
    print '%d/%d - %0.2f%%' % (count, batches.nb_sample, 100.0*count/batches.nb_sample), "\r",
    count += imgs.shape[0]
    # if we've done one pass we should break out - otherwise infinite loop
    if count >= batches.nb_sample:
        break

avg = sum/(count*224*224)
print avg
# [ 124.583   116.073   106.3996]

PS: Does anyone know where to get the original vgg16 dataset from?


(Eugene Ware) #9

To answer my own question, it was based on the ImageNet 2014 competition, where you can download the data here

Here’s the data it was trained on I believe:

And these were the 1000 categories


(Eugene Ware) #10

Actually, from Lecture 3, I think it was the 2012 dataset which is here


(Eugene Ware) #11

So I downloaded the VGG dataset and as an exercise tried to calculate the VGG16 mean. I ended up getting [ 122.6778 116.6522 103.9997]

It took 6.5 hours and consisted of 1,281,167 images.

I ran it on the AWS P2.xlarge instance.

The official mean that gets used in the vgg16.py file is [123.68, 116.779, 103.939] which is quite close, and the difference may be down to floating point errors perhaps.

I calculated it on the training set only, so it’s possible that the official number also used the test and validation sets perhaps. Though, I’ve heard it’s generally not a good idea to calculate the mean from the validation or test sets.

Preparing the imagenet data was a bit of a challenge as the training data set was 138G, so I had to resize my EBS volume on my server a few times while I was extracting things.

If anyone’s interested I can share the process I used to download and extract and order the data so that keras can use it.