I tried to calculate this myself too. This is how I did it (i did with the cats dogs redux data, which is obviously not the original vgg16 training set. But I've read that when building your own model from scratch, you should do this over your training set (and exclude your validation set).
Anyway, couldn't find any good code examples, so I thought I'd post this here. Probably a faster way to do this (this should be much more parellisable as only one core on the AWS box was pegged. But it only takes about 3 minutes to run:
batches = gen.flow_from_directory(path + 'train', target_size=(224,224),
class_mode='categorical', shuffle=False, batch_size=128)
sum = np.array([0.0, 0.0, 0.0]);
count = 0
for imgs, labels in batches:
sum += np.sum(imgs, axis=(0, 2, 3))
print '%d/%d - %0.2f%%' % (count, batches.nb_sample, 100.0*count/batches.nb_sample), "\r",
count += imgs.shape
# if we've done one pass we should break out - otherwise infinite loop
if count >= batches.nb_sample:
avg = sum/(count*224*224)
# [ 124.583 116.073 106.3996]
PS: Does anyone know where to get the original vgg16 dataset from?