Lesson 9 assignments

A quick reminder about this weeks assignments:

  • Try some of the extensions listed in https://github.com/titu1994/Neural-Style-Transfer if you want to play more with style transfer
  • Implement Perceptual Losses for Real-Time Style Transfer
  • Read the perceptual losses paper and supplementary materials, particularly ensuring you understand the methods and notation. Don’t be shy about asking if you have questions!
  • Remember the golden rules about How to use the Provided Notebooks, particularly: the test of whether you understand something is whether you can build it yourself. And remember to try to solve problems yourself for 30 mins before asking for help - but then please do ask for help! :slight_smile:
  • If you have space, try to download the imagenet competition training and validation sets

Could it be that training of the second part of neural_style notebook (the super resolution part) takes 45 minutes per epoch? or should I check whats wrong with my configuration…

Yes it does take longer to train – that’s the trade-off. The paper has a graph that compares training time to output quality.

On a Titan X I’m getting 15 mins per iteration once I remove the axis= parameter to BatchNormalization (my dumb mistake that was luckily noticed during class). Generally AWS P2 is at least twice as slow.

For the “Perceptual Losses for Real-Time Style Transfer” assignment, the paper states that the residual block should use ‘valid’ padding. As a result, the 2 conv layers reduce the width and height by 4 (2 for each layer). That makes the subsequent merge layer summing up two tensors with different sizes. The paper is suggesting to perform a “center crop on the input feature map”.

I am thinking to write a new layer called Cropping2D to handle the cropping before the input tensor of the residual block is passed to the merge layer. I would like to confirm whether I am heading to the right direction before I implement this. @jeremy could you confirm this approach?

Is it required to train the VGG16 network on MS COCO dataset for fast-style-trasnfer? I’m using a pre-trained imagenet model and that should be more than fine right?

EDIT: No. That was a stupid question.

I just wrote a simple little residual block that used regular array slicing to grab the center crop before merging. No need to have a special Cropping2D layer, although if you do it that way it might be a good exercise and also a handy thing to have around for later…

1 Like

Does anyone know what the purpose of this line is
vgg_l = Lambda(preproc) outp_l = vgg_l(outp)

outp_l is not used anywhere else. It looks like it’s preprocessing the image for the vgg network but it’s not being used anywhere else and it’s not being used on the other image.

@bckenstler Jeremy talks about it in second video around minute 23. it is used 2 lines later, in
vgg2 = vgg_content(outp_l)

1 Like

Oh good point - I didn’t use the Lambda correctly in my previous version, although as @shgidi mentioned I fixed it in class . I just updated the version on platform.ai with that corrected.

I’m trying to convert the super-res code to fast neural style and I am a bit struggling…
Here is what I should do , if I understand correctly:

  1. up-sampling network architecture should be changed as in the supplement to the article
  2. in the loss function, the high resolution pass in vgg_1 should be switched with some kind of style function. however, I do not know how to take the style_targs function, and put it in the correct format… will be glad for some tips :slight_smile:

You’ll also need to add a downsampling section to the start of the network.

The loss function should be the same as used in the original artistic style approach. So look at how we went from content loss in the original approach to style+content loss, and make the same changes to the fast style approach.

1 Like

One question for preproc: in lesson 8 and 9 we use the preproc function to convert images before they are fed into VGG16 model. However, in Part 1 I do not recall doing that. Why do we need preproc now but not earlier? Thanks!

We defined our own version of vgg in part 1, which included that Lambda layer (see vgg16.py in part 1’s repo).

Oh I see. Thanks!

Thanks Jeremy, by down-sampling you mean the decrease in res-block sizes in the cnn? how is it done?

from the supplement I see that they discuss removing padding from the res blocks. how to implement it in keras?

I’ve changed the last line of the res_block into:
return merge([x, ip], mode='sum')[:,2:-2,2:-2,:]
hope that will do the trick…

No, I mean add a few stride 2 conv layers. See the paper’s supplemental notes for details.

In the supplementary materials, the authors write the following:

For style transfer, we found that standard zero-padded convolutions resulted in severe artifacts around the borders of the generated image. We therefore remove padding from the convolutions in residual blocks.

I’m interpreting this as modifying the res_block code Jeremy provided in the super-resolution demo to use border_mode = ‘valid’ for each of the 2 Conv2D’s. Is this a good interpretation or am I missing something?

Update - Confirmed this is true from the lecture video.

As a general handy tip for people who may not have been around for session 1, I’m finding the graph visualization of models to be a really effective way of feeling like I understand what Keras is doing under the hood; Jeremy explains it in this post from last November, so I won’t re-invent the wheel there, except to ‘bump’ it here:


Weird problem: is anyone else having issues with Python 3’s f-strings not being able to close properly? In my Python-3 notebook, whenever I switch from a string to an f-string (i.e. when I put an ‘f’ in front), my notebook then treats the remainder of the cell as a single string. It seems to be able to evaluate it properly (insofar as it doesn’t throw a syntax error when I execute the cell), but it’s bothering me, and I wonder if anyone else has run into a similar problem