Error in vgg_preprocess?

pdoerschuk · May 10, 2017, 9:18pm

Doesn’t Theano store images in (channel, row, col) order? vgg_preprocess(x) returns x[:,::-1] to reverse rgb -> bgr. Shouldn’t it instead return x[::-1,:,:]? Am I missing something here?
Thanks for your help.
P

jeremy · May 11, 2017, 3:42am

I seem to remember seeing an error like that in one of the notebooks at some point - so yes, possibly you found a mistake. Where did you see this?

utk · May 11, 2017, 5:27am

This part is present in vgg16.py file. Inside there is a vgg_preprocess(x) function which does the conversion.

pdoerschuk · May 11, 2017, 3:01pm

Hi, Jeremy.
Thanks very much for your quick response. As Utkarsh remarked, this is in vgg16.py in the vgg_preprocess(x) function.
Thanks very much to both you and Rachel for these wonderful resources! They are very helpful.
With best regards,
Peggy

pdoerschuk · May 11, 2017, 4:28pm

Hi, Jeremy.
It’s also in vgg_preprocess(x) in vgg16bn.py.
Thank you.
P

Christina · May 18, 2017, 1:41am

If anyone’s interested, here is the Tensorflow counterpart (channels last):

vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((1,1,3)) # This one for Tensorflow

def vgg_preprocess(x):
    x = x - img_mean
    return x[:,:,::-1] # reverse axis rgb->bgr # This one for Tensorflow (channels last)

olix20 · May 18, 2017, 2:58pm

Hijacking the topic, why do we always use this mean and not the mean of the dataset we’re actually working on?

Christina · May 18, 2017, 9:40pm

It’s the mean of the dataset that you’re working with – if you’re training a network from scratch. Since we use pre-trained imagenet weights for many of our image classification problems using VGG, Resnet, etc we need to use the same mean they trained with.

passinger · July 4, 2017, 4:34pm

So, for these 3 values, 123.68, 116.779, 103.939, my understanding is for R G B instead of B G R, right?

passinger · July 4, 2017, 4:41pm

In your tensorflow function of vgg_preprocess(x), the return line should be
return x[:,:,:,::-1].

You miss one dimension.

rforgione · July 6, 2017, 4:43am

Yeah, this was tripping me up too.

Interestingly, when I made this change, the performance of my network didn’t change much. Not sure why this is, but my guess is that it’s because the green value remains the same in both orderings, and the other two only change intensities by ~20 (on a scale of 0-255, this is a change of ~8%). Basically, since the means are all on roughly the same scale to begin with, flipping their order doesn’t seem to shake things up that much.

This is just a hypothesis based on intuition, but I’d love to get the opinion of someone with a bit more knowledge!

Christina · July 7, 2017, 9:14pm

Interesting… when I add the extra dimension, I get a “too many indices for array” error. But I was doing it outside of the function vgg_preprocess()…

Patrick · August 17, 2017, 7:46pm

I believe so. We subtract the means before reordering the indices, so 123.68 – > R, 116.779 --> G, 103.939 --> B

charles · August 26, 2017, 6:52am

HI, I also think the vgg_preprocess(x) should return [::-1,::], but after I changed the fuction, the performance of the vgg_net was actually getting worse. I don’t understand what’s going on there. Did you meet the same problem?

charles · August 26, 2017, 7:07am

why there are four dimsions here? x[:,:,:,::-1], In the vgg_preprocess fuction, vgg_mean that we use only has three dimension. Is it possible to subtract it if they have differeant dimeansion?
vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))

kab · July 19, 2018, 7:49pm

Hi, can you share your whole modified vgg16.py to work with Tensorflow instead of Theano? I made the change you suggested, but am running into dimension errors. Did you change all the dimensions to (224,224,3) in the rest of the code also?