Lesson 8 Discussion

@thunderingtyphoons @xinxin.li.seattle

I started with the same parameters in the notebook. Then I tried different styles and different iterations but all of them less than 50.

I think the input picture has a lot to do with the output and the quality of the styling. For the one in medium, when used pictures like the starry night or any other similar scene it didnt do well. Since my output was portrait I thought I will try the input, a portrait as well.

That worked, and then to adjust the quality I tried with different iterations and for the one I posted , it was 20 th iteration.

I did not try any changes in the loss function.

Great start! Looking forward to reading the next one :slight_smile:

I am trying to understand why wrapping the prediction (as shown below) in K.variable below is necessary.

targ = K.variable(layer_model.predict(img_arr))

@jeremy says it is to make targ reside in the GPU. But I thought, model.predict uses the GPU for evaluation by default. If the idea is to make it live in the GPU so that it can be reused later in the computation graph (in K.function), then why is the loss not wrapped in a K.variable?

2 Likes

Conversion of weight files b/w TF and TH back and forth:

I took part 1 as a MOOC and had had set up my machine on top of TF, not Theano at the time. So, when I tried to run the notebooks I couldn’t run VGG16_BN as it came with weigths trained on Theano.
In the process I found a useful script on GitHub by titu1994. This script allows you to convert keras models and their weights files both ways. Very useful if you find some work on either and prefer to work using the other!
You’ll have to edit the file inside and provide keras code for building your models as well as the weights files to convert them. Instructions are given in the script.

Hope some ppl will find it useful.

3 Likes

predict() does indeed run on the gpu, but it returns a standard numpy array - i.e. it copies the result back to standard RAM. So we need to put it back on the GPU again! :slight_smile:

1 Like

Question about this line:
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)

I think I understand the reason why to do this, and some googling helped:

So my question is: where did the values come from? If I understand this correctly, these are the {R, G, B} mean values across the entire imagenet dataset - but where did they come from?

1 Like

@paulm It’s given by VGG author. https://gist.github.com/ksimonyan/3785162f95cd2d5fee77#description

1 Like

Thanks @jeremy. Sorry for persisting on this – but would like to gain a better understanding if possible.

So, in the statement below, metrics.mse returns a Tensor. Does it mean that mse is evaluated in the GPU and the results are also stored in the GPU? In general, how do we figure out when to wrap something in K.variable?

loss = metrics.mse(layer, targ)

1 Like

Ah OK I understand you @karthik_k314,
This is because the model reuse previous computation style size in our case. So to avoid size change you could use an image of the same size range (that covers the original image size) when you create the style.
Below what I got with a denominator of 0.2.
Original image :

Output image :

In case, it helps someone.

I resize the style and images by doing something like this

# Resize image and style sizes so that they are similar
print(style.size,img.size)

img_test = img.resize((570,300), PIL.Image.ANTIALIAS)
img_test

img_arr_test = preproc(np.expand_dims(img_test,0)[:,:,:,:3])

#Implement check to make sure all images, style and shp are the same sinze
print(img_arr_test.shape,style_arr.shape,shp)
3 Likes

I’m not sure there’s a simple rule of thumb, other than checking the type of the output. Basically, anything designed to be used inside a keras network (including metrics, as you noticed), has to return a tensor, whereas anything designed to give you information back from a network, has to return a numpy array.

1 Like

I was looking for labels of imagenet, and I’ve found this page. these labels can be easily parsed with pandas.

The labels are simply the directory names, just like the standard keras generator approach.

@jeremy The link for the slides and resources www.platform.ai/part2/lesson1/ does not seam to work. Please what is the issue?

Works for me - what problem are you having?

It was not just loading, but i used wget to download them , thats the getaround

I and @Matthew did a quick sync up with respect to our Keras implementation of Neural style transfer and here are some of the questions we still are unsure of:

Computation graph and mse (metrics.mse(y_labels, y_pred)): In our implementation, first input is precomputed activations(targ) whereas the second input is essentially a to be computed tensor (content_layer)
Is this always the usage of metrics.mse? And it basically creates a graph but does not compute anything yet?

Preproc: Shouldn’t we clip after substracting mean to make sure they are in 0,255 range?

Gram_matrix: Why do we divide result with x.get_shape().num_elements(). Seems like it is not necessary?

Some questions on Evaluator:

  1. When is using evaluator required? Seems like gradient(fprime) is optional in fmin_l_bfgs_b. Our guess is we need it in our case as our gradient is unusual that it is on the input image rather than network weight. Is that a fair guess?
  2. Flatten/reshape: Why do we x.flatten when calling fmin_l_bfgs_b but again reshape within evaluator? Also, why do we flatten gradients in evaluator?
  3. float64: Why do we convert loss and gradients to float64?

Thanks,

1 Like

You can pass any symbolic tensors to mse - it’s what keras uses behind the scenes any time you use ‘mse’ as your loss function. Like nearly all keras functions, it works on symbolic tensors, so doesn’t compute anything right away. (predict is an example of an exception to this rule).

Why would you expect that? The input to VGG should have a mean of zero, so can’t be in that range.

Not necessary, but it’s nice to have the content loss and style loss to be the same order of magnitude, so we take the mean. In the paper they divide by 4n^2m^2 later, which has the same effect.

See the docs for that function - if you don’t pass the gradient function, it’ll need to calculate it with finite differencing, which is terribly slow.

Since this is a generic optimizer, it doesn’t know how to deal with anything other than vector inputs, so we have to flatten what we provide the function.

The function expects float64 arrays to be passed to it.

Try playing around with standard scipy optimizers to get a feel for how they work, on non-deep learning problems. E.g. http://www.scipy-lectures.org/advanced/mathematical_optimization/

3 Likes

Thanks for your inputs as always @jeremy!

Why would you expect that? The input to VGG should have a mean of zero, so can’t be in that range.

I just reread the description in VGG for preprocessing.
“The input images should be zero-centered by mean pixel subtraction”, so the preprocessed pixel values, in fact, are expected to be both negative and positive. Makes sense.

I understand it’s kind of late, but trying to run neural-style.ipynb and I am getting the following error while loading VGG model in step: model = VGG16_Avg(include_top=False)

Error: TypeError: _obtain_input_shape() got an unexpected keyword argument ‘dim_ordering’

Any pointers would be greatly appreciated. have spent lot of time trying to debug this issue.

Thanks