Lesson 9 wiki

This post is editable by all of you! Please edit it to add any useful information for this weeks class, including links brought up during class, other helpful readings, useful code/shell snippets etc. Also, please help organize this wiki post by putting things in sections, adding/editing prose, etc.

Class links

Papers

Links

A note on real-time style transfer

Sometimes we calculate the error of a network not by comparing its output to labels immediately, but by first putting its output through a function, and comparing that new output to something we consider to be ideal. That function could be another neural network. For example, in real-time style transfer (Johnson et al.), the network we train takes an image and transforms it into another image; we then take that generated image and analyze it with another neural network, comparing the new output with something we consider to be ideal. The point of the second neural network is to assess the error in the generated image in a deeper way than just calculating errors pixel by pixel with respect to an image we consider to be ideal. The authors of the real-time style transfer paper call this higher-level error “perceptual loss”, as opposed to “per-pixel loss”.

4 Likes

I just added the videos to the wiki post FYI.

Earlier this week, i listed to and transcribed the lectures. They (like all the other lecture transcripts) are accessible in: https://drive.google.com/open?id=0BxXRvbqKucuNcXdyZ0pYSW5UTzA

The specific links for the transcripts for the videos this week are in:
https://drive.google.com/open?id=0BxXRvbqKucuNYjRpM3A0Y2pMTHc
https://drive.google.com/open?id=0BxXRvbqKucuNaEN1V1lPNUw4ZU0

If you run into an error in them, please let me know

8 Likes

@jeremy When you say ‘ImageNet’ in class, do you generally mean the LSVRC 2012 Training Set (as linked to above)?

I ask, since I’m also interested in playing the full ImageNet set (or specific subsets of it). There are categories (like fungus) in the full set that I wanted to play around with.

Yup! Either from the official site, or from academictorrents.

Here is my notebook for Real time Style transfer, which I rewrote myself but looks pretty similar to the one we saw in the class:

  1. It has a bunch of notes from the paper which I felt are like the key takeaways
  2. Has comments for most non-trivial tasks. So if you ever wonder “what/why are we doing in this line of code” - It might help answer.
    https://github.com/sravya8/DL/blob/master/styleTransfer/RealTimeStyleTransfer.ipynb

Similarly for Super Resolution using perceptual loss:

I considered moving all the insights and comments onto the wiki here. But it is too much work and not sure if it is better than “code first approach” :slight_smile:

7 Likes

Thanks so much! Do you mind if we borrow from these to improve the notebooks we provided?

Absolutely, please go ahead! Would be my pleasure.

@Jeremy, @Rachel - the link to the PPT is broken. I couldn’t find it on files.fast.ai either.

1 Like

Tried to edit the wiki to add link for checkerboard patterns. I am getting a error saying title should be more than 15 characters.

Here is the link. http://distill.pub/2016/deconv-checkerboard/

2 Likes

Hi sravya,
I have some questions about your code, correct me if I’m wrong.
1.your “vgg_content” is computing 4 layers’ output, and then put them into the totloss computation, I noticed that you computed these 4 layers’ content_loss, why did you do that? Should we just use only 1 layer to compute the content_loss?

def get_outp(m, ln): return m.get_layer(f'block{ln}_conv2').output
vgg_content = Model(vgg_inp, [get_outp(vgg, o) for o in [2,3,4,5]])

vgg1 = vgg_content(vgg_inp)
vgg2 = vgg_content(outp)
def tot_loss(x):
    loss = 0; n = len(style_targs)
    for i in range(n):
        loss += mean_sqr_b(gram_matrix_b(x[i+n]) - gram_matrix_b(style_targs[i])) / 2.
        loss += mean_sqr_b(x[i]-x[i+n]) * w[i]
    return loss

2.Are you didn’t follow the paper’s suggested style layers choice? Since the paper is recommanding [‘block1_conv2’, ‘block2_conv2’, ‘block3_conv3’, ‘block4_conv3’], you chose [‘block2_conv2’, ‘block3_conv2’, ‘block4_conv3’, ‘block5_conv2’], are you choose these layer for reasons? Did these layers perform better?

I’m unable to edit the original post for some reason, but wanted to add link for the DeVISE paper:

DeVISE - Deep Visual Semantic Embedding model

What happens when you try to edit the original post?

I kept getting a weird Disqus error complaining about “Resource not found”. I tried logging in/out. It might be a China-firewall thing (I’m accessing the forums via a proxy).

Sorry to be a nuisance! just wanted to contribute.

@jeremy Thanks for wonderful lecture :slight_smile: I wonder if ppt file is not allowed to download :slight_smile: link is not work…error msg: You don’t have access

Thanks

Thanks for wonderful lecture.

I have some questions about super resolution of the paper “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”.

1 : Where could I find “trn_resized_72_r.bc” and “trn_resized_288_r.bc”?Are they part of the imagenet data set provided in the torrent link of lesson 9?

2 : I find out the axis of K.mean is difficult to understand

def mean_sqr_b(diff): 
    dims = list(range(1,K.ndim(diff)))
    return K.expand_dims(K.sqrt(K.mean(diff**2, dims)), 0)

#result of this line is same as K.mean(diff), but K.mean(diff, dims) return an array
K.mean(diff, dims) 

I do some experiments on numpy, the results confuse me(I think K.mean is similar to numpy.mean)

shape = (1,2,2,2)
cimg = np.arange(8)
cimg = cimg.reshape(shape)

print(cimg)

contents of cimg
[[[[0 1]
[2 3]]

[[4 5]
[6 7]]]]

print(np.mean(cimg))
print(np.mean(cimg, (1,2)))
print(np.mean(cimg, (1,2,3)))

results are

3.5
[[ 3.  4.]]
[ 3.5]

What is going on?

3 : In the video of lesson 9, why the target is

targ = np.zeros((arr_hr.shape[0], 128))

but not

targ = np.zeros((arr_hr.shape[0], 144, 144, 128))

Answer : I find out why, because the lost function output 128 loss value.Why 128 but not 1?Maybe 128 is more discriminate. When using 3 output layers, we make the output as single value, because this is easier to calculate.

Edit : By the way, I saw some interesting videos on youtube

Enhance! Super Resolution From Google | Two Minute Papers #124
What is Google RAISR? Google RAISR Software | Smart Upsampling of Photos

I do not know how good they are compare with the super resolution with perceptual loss, I am still trying to understand lesson 9, if everyone know how good/bad it is, please share some comments, thanks

You can find the necessary files for the lesson in http://files.fast.ai/data/

For your 2nd question how np.mean or K.mean works .

  1. Both are going to take dimensions on which it needs to calculate mean. In the lesson we do not use the 0 dimension as it contains batch information.

An explanation for your example :

np.mean(cimg) – > Its going to calculate mean value for all the values in the tensor.
np.mean(cimg, (1,2)) – > It only calculates on the 1, 2 axis seperately, i.e (0+4+2+6)/4 = 3 , (1+5+3+7)/4 = 4
np.mean(cimg, (1,2,3)) --> This example is closer to what we use in the class , except the batch has only one example. To understand what is happening clearly I would recommend you try the below.

shape = (2,2,2,2)
cimg = np.arange(16)
cimg = cimg.reshape(shape)

np.mean(cimg,(1,2,3)) --> Which should give you an output of [3.5,11.5] i.e mean of two examples in the batch.

If its not clear , try to write it on a paper or excel which generally helps me to understand better.

Thanks,
Vishnu

1 Like

Hi, I have a question about the architecture used in the Neural Style notebook: https://github.com/fastai/courses/blob/master/deeplearning2/neural-style.ipynb

Specifically, in In [40], the first convolutional block uses a filter size of 9x9. This is larger than the filter sizes I’ve come across in other CNNs; is there a particular reason or benefit to this?

Hi Jeremy,
I was trying to run the code for superesolution step by step but I encountered an error, “cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88”
I am running this in google colab.
Please help!