there are three common techniques for value normalization:
- (x - x.min()) / (x.max() - x.min()) # values from 0 to 1
- 2*(x - x.min()) / (x.max() - x.min()) - 1 # values from -1 to 1
- (x - x.mean()) / x.std() # values from ? to ?, but mean at 0
you're doing pretty much the first one, without the thought that values don't necessarily need to start at 0. thats why subtracting the min is always a good idea. the second approach is very similar, only that it's range centers at 0.
If VGG really does it this way, they are essentially doing the first part of the third technique, meaning that the mean is zero based. Dividing by the standard deviation afterwards is always a good idea to put your values on the same scale.
as far as i know the cleanest normalization is the 3., because its the only one that centers the mean, which helps a lot with exploding or disappearing gradients. that being said, i've never found myself in the situation where using the 3. technique instead of the 1. has given me better performance.