Error in vgg_preprocess?


(Peggy) #1

Doesn’t Theano store images in (channel, row, col) order? vgg_preprocess(x) returns x[:,::-1] to reverse rgb -> bgr. Shouldn’t it instead return x[::-1,:,:]? Am I missing something here?
Thanks for your help.
P


(Jeremy Howard) #2

I seem to remember seeing an error like that in one of the notebooks at some point - so yes, possibly you found a mistake. Where did you see this?


(Utkarsh) #3

This part is present in vgg16.py file. Inside there is a vgg_preprocess(x) function which does the conversion.


(Peggy) #4

Hi, Jeremy.
Thanks very much for your quick response. As Utkarsh remarked, this is in vgg16.py in the vgg_preprocess(x) function.
Thanks very much to both you and Rachel for these wonderful resources! They are very helpful.
With best regards,
Peggy


(Peggy) #5

Hi, Jeremy.
It’s also in vgg_preprocess(x) in vgg16bn.py.
Thank you.
P


(Christina Young) #6

If anyone’s interested, here is the Tensorflow counterpart (channels last):

vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((1,1,3)) # This one for Tensorflow

def vgg_preprocess(x):
    x = x - img_mean
    return x[:,:,::-1] # reverse axis rgb->bgr # This one for Tensorflow (channels last)

(Armin) #7

Hijacking the topic, why do we always use this mean and not the mean of the dataset we’re actually working on?


(Christina Young) #8

It’s the mean of the dataset that you’re working with – if you’re training a network from scratch. Since we use pre-trained imagenet weights for many of our image classification problems using VGG, Resnet, etc we need to use the same mean they trained with.


#9

So, for these 3 values, 123.68, 116.779, 103.939, my understanding is for R G B instead of B G R, right?


#10

In your tensorflow function of vgg_preprocess(x), the return line should be
return x[:,:,:,::-1].

You miss one dimension.


(Rob Forgione) #11

Yeah, this was tripping me up too.

Interestingly, when I made this change, the performance of my network didn’t change much. Not sure why this is, but my guess is that it’s because the green value remains the same in both orderings, and the other two only change intensities by ~20 (on a scale of 0-255, this is a change of ~8%). Basically, since the means are all on roughly the same scale to begin with, flipping their order doesn’t seem to shake things up that much.

This is just a hypothesis based on intuition, but I’d love to get the opinion of someone with a bit more knowledge!


(Christina Young) #12

Interesting… when I add the extra dimension, I get a “too many indices for array” error. But I was doing it outside of the function vgg_preprocess()…


#13

I believe so. We subtract the means before reordering the indices, so 123.68 – > R, 116.779 --> G, 103.939 --> B


#15

HI, I also think the vgg_preprocess(x) should return [::-1,::], but after I changed the fuction, the performance of the vgg_net was actually getting worse. I don’t understand what’s going on there. Did you meet the same problem?


#16

why there are four dimsions here? x[:,:,:,::-1], In the vgg_preprocess fuction, vgg_mean that we use only has three dimension. Is it possible to subtract it if they have differeant dimeansion?
vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))


(kab) #17

Hi, can you share your whole modified vgg16.py to work with Tensorflow instead of Theano? I made the change you suggested, but am running into dimension errors. Did you change all the dimensions to (224,224,3) in the rest of the code also?