The inputs of loss and grads are different; whereas grads can directly take the input of model.input, loss is created from layer, which is not the same variable as model.input. How does the Keras function know the difference?

Confuse with the codes of lesson8, I am not sure what are they doing.
Following are my questions and guess

model = vgg16_avg.VGG16_Avg(include_top=False)
layer = model.get_layer('block4_conv1').output
# The layer_model output the activation of the input
layer_model = Model(model.input, layer)
# pre-calculate the output of the "block_conv1", put them into K.variable
# because predict output a numpy array but not a Tensor
targ = K.variable(layer_model.predict(img_arr))
# mse will compute the loss between output of the layer and targ
# value of targ is precomputed, but what is the purpose of layer?
# It would take the input image and output the activation(same as layer_model?)?
loss = metrics.mse(layer, targ)
grads = K.gradients(loss, model.input)
# We cannot call loss nor grads directly, instead we feed them into K.function
# As gtseng asked, what is happening when we call fn(input_img)?
fn = K.function([model.input], [loss]+grads)

I spend several hours to find the answers by DuckDuckGo, read all of the replies of Lesson8 Discussion, tweak and run the codes many times but still cannot clear the miasma in my head, please lent me a hand if you can, thanks.

@tham Your guesses are basically correct. The purpose of “layer” is to retrieve the output of the intermediate layer. You must distinguish between the layer_model (which is a keras model defined to do the precomputation) and the layer itself (which is just an intermediate tensor which contains the layer which is in turn fed to the Model module to build the layer_model. Therefore, layer does not take anything as input or output, layer_model does.

You do not call the fn directly, you feed such function into a solver, along with the symbolic definition of the function loss and gradient with respect to its input. Once fed into the solver, the input of the function is iteratively modified so as to minimize its loss and gradients.

Thanks, I think I got it now, These kind of programming paradigm are quite confusing at the first time I meet them, now I gradually get it(I hope). Please correct me if anything wrong

#this line specify the layer_model which can do precomputation and the
#input can feed into the Mode later on
layer_model = Model(model.input, layer)
#this line feed the intermediate tensor, layer into the mse
loss = metrics.mse(layer, targ)
#this line feed the loss function(I guess TensorFlow need it to figure out how to find gradient)
#and model.input into the gradient
grads = K.gradients(loss, model.input)
#this line feed the model.input and loss,grads into the function, because
#we create layer_model before, TensorFlow know how to create a Model
#by model.input and layer. We concatenate loss and grads, because we want
#the function find loss and grads, I guess this function will call loss first,
#then call the grads
fn = K.function([model.input], [loss]+grads)

I think the codes after updated on github are easier to read, less confusing(at least for new comers).
Whatever, lesson 8 and the notebook are awesome, for me the most important part is it show us
how to implement the algorithm write on the paper, teach us how to read papers.

By the way, write a post to record my understanding about the implementation details of lesson 8, I hope I get it right.

The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).

Keras in the current example uses tensorflow graphs . When you ask Keras to output a particular variable , it uses the information in the graph to calculate the output . Say you pass a bunch of dogs & cats images through a network architecture and calculate the below.

Predicted labels.

Loss = loss_fn(Predicted labels,Actual_labels)

Gradients = Keras calculates automatically the gradients by which the weights have to change to reduce the loss.

In our example , we have created the variables for loss and gradients which contains the graph on how to calculate them. We use K.function() to create a new function which takes inputs and applies the network and outputs loss and grads of the network.

For the Neural Styl Transfer, I am able to get the Content extraction working in Keras 2 but not the Style Extraction. I end up with a random image, even through the loss gets progressively reduced.

Can anyone point to any Keras 2 code that I can try? I cannot see what mistake I am making…Please advice

shp=(1,224,224,3)
# I am using VGG16 with max pooling, not average. Should work, according to forums..
vgg16_style=keras.applications.vgg16.VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3))
style_layers=[vgg16_style.get_layer(name='block{}_conv1'.format(o)).output for o in range(1,3)]
style_layer_model = Model(vgg16_style.input, style_layers)
style_targets=[K.variable(o) for o in style_layer_model.predict(starry.reshape(shp))]
# Same as Jeremy's code
def gram_matrix(x):
# We want each row to be a channel, and the columns to be flattened x,y locations
features = K.batch_flatten(K.permute_dimensions(x, (2,0,1)))
# The dot product of this with its transpose shows the correlation
# between each pair of channels
return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()
# Using K.mean to get a single number loss
def style_loss(x, targ):
return K.mean(metrics.mse(gram_matrix(x), gram_matrix(targ)))
sloss = sum([style_loss(style_layer[0], style_target[0]) for style_layer, style_target in zip(style_layers, style_targets)])
sgrads = K.gradients(sloss, style_layer_model.input)
style_fn = K.function([style_layer_model.input], [sloss]+sgrads)
evaluator = Evaluator(style_fn, shp)
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/1
iterations=10
x = rand_img(shp)
x = solve_image(evaluator, iterations, x)

e.g. the output of VGG16 block5_conv1, is (1, 14, 14, 512) , whereas what you need for the loss function is a (14,14,512) shape.

if you use l1 directly, the shapes won’t match and you may get an error. Instead, you can do a l1[0], and it now aligns properly for loss calculation

Also, you may have to use a K.mean because you have two layers of different dimensions, and you need to add the losses from each layer. K.mean will convert the loss from each of your layers to a single number, and you can now add them up (in that code loop)

I did a timeline of the Lesson 8 video as I found them very practical in Part 1 wiki.
There are many links, probably there was a lot of “happy noise” in the first class
I expect future timelines to be shorter.

Hi all! Just wanted to point out few issues that I had with first part of lesson 8.

Task: Get the bird out of noise, using only tensorflow 1.2 and its contrib.keras without keras .That is, content transfer. (afaik from TF Dev videos, standalone keras will be deprecated some time in the future).

I had to replace the inports on vgg_16_average.py, and handcode the loss keras.backend.mean(keras.backend.square(content_model.output - content_target)). Other than that, most of the “porting” is straight forward and/or easily solved with a bit of doc surfing.

However, I fought a lot with a nasty bug (feature) in scikit-image. I used skimage.transform.rescale() to get my image size down. Default, this function also scales your image to [0,1]. VERY nasty feature, nothing worked. I got my MSE down but the out image is pure noise. It took me a while to track down this issue. So, use preserve_range=True parameter.

Lessons learned:

LL1: VGG is VERY sensitive both to centering the values (subtracting mean) and with the “standard deviation” of the input. It expects the range between -128 – 128 (ish)

LL2: The Evaluator class is needed so you don’t have to run the network twice (once for the loss and second for the gradients). If smb has a more elegant (pythonic) way of doing this please post!

LL3: Accidentally noted that the range of the initial random image does not matter much. Smaller values give smoother images and larger values (close to the input dynamic range) yields interesting images.