Lesson 9 wiki

jeremy · March 7, 2017, 8:03pm

This post is editable by all of you! Please edit it to add any useful information for this weeks class, including links brought up during class, other helpful readings, useful code/shell snippets etc. Also, please help organize this wiki post by putting things in sections, adding/editing prose, etc.

Class links

Lesson 9 discussion
Video
Lesson 9 video timeline
Updated Powerpoint
The notebook for style transfer and SR is same as last week
DeVISE notebook
Lesson 9 assignments

Papers

Links

Github python files
Imagenet dataset via torrent, for those that don’t have access to a .edu address (or who prefer torrent format)
http://distill.pub/2016/deconv-checkerboard/

A note on real-time style transfer

Sometimes we calculate the error of a network not by comparing its output to labels immediately, but by first putting its output through a function, and comparing that new output to something we consider to be ideal. That function could be another neural network. For example, in real-time style transfer (Johnson et al.), the network we train takes an image and transforms it into another image; we then take that generated image and analyze it with another neural network, comparing the new output with something we consider to be ideal. The point of the second neural network is to assess the error in the generated image in a deeper way than just calculating errors pixel by pixel with respect to an image we consider to be ideal. The authors of the real-time style transfer paper call this higher-level error “perceptual loss”, as opposed to “per-pixel loss”.

jeremy · March 7, 2017, 10:10pm

I just added the videos to the wiki post FYI.

lin.crampton · March 12, 2017, 10:15pm

Earlier this week, i listed to and transcribed the lectures. They (like all the other lecture transcripts) are accessible in: https://drive.google.com/open?id=0BxXRvbqKucuNcXdyZ0pYSW5UTzA

The specific links for the transcripts for the videos this week are in:
https://drive.google.com/open?id=0BxXRvbqKucuNYjRpM3A0Y2pMTHc
https://drive.google.com/open?id=0BxXRvbqKucuNaEN1V1lPNUw4ZU0

If you run into an error in them, please let me know

jpuderer · March 14, 2017, 1:24pm

@jeremy When you say ‘ImageNet’ in class, do you generally mean the LSVRC 2012 Training Set (as linked to above)?

I ask, since I’m also interested in playing the full ImageNet set (or specific subsets of it). There are categories (like fungus) in the full set that I wanted to play around with.

jeremy · March 14, 2017, 6:10pm

Yup! Either from the official site, or from academictorrents.

sravya8 · March 21, 2017, 11:54pm

Here is my notebook for Real time Style transfer, which I rewrote myself but looks pretty similar to the one we saw in the class:

It has a bunch of notes from the paper which I felt are like the key takeaways
Has comments for most non-trivial tasks. So if you ever wonder “what/why are we doing in this line of code” - It might help answer.
https://github.com/sravya8/DL/blob/master/styleTransfer/RealTimeStyleTransfer.ipynb

Similarly for Super Resolution using perceptual loss:

github.com

sravya8/DL/blob/master/Super resolution/SuperResolution-Copy1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Super Resolution using \"Perceptual Losses for Real-Time Style Transfer and Super-Resolution\" by Justin et.al  \n",
    "http://arxiv.org/abs/1603.08155"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",

This file has been truncated. show original

I considered moving all the insights and comments onto the wiki here. But it is too much work and not sure if it is better than “code first approach”

jeremy · March 22, 2017, 6:28pm

Thanks so much! Do you mind if we borrow from these to improve the notebooks we provided?

sravya8 · March 23, 2017, 4:32am

Absolutely, please go ahead! Would be my pleasure.

iNLyze · April 21, 2017, 8:59pm

@Jeremy, @Rachel - the link to the PPT is broken. I couldn’t find it on files.fast.ai either.

VishnuSubramanian · May 5, 2017, 6:33am

Tried to edit the wiki to add link for checkerboard patterns. I am getting a error saying title should be more than 15 characters.

Here is the link. http://distill.pub/2016/deconv-checkerboard/

justinho · May 20, 2017, 2:40pm

Hi sravya,
I have some questions about your code, correct me if I’m wrong.
1.your “vgg_content” is computing 4 layers’ output, and then put them into the totloss computation, I noticed that you computed these 4 layers’ content_loss, why did you do that? Should we just use only 1 layer to compute the content_loss?

def get_outp(m, ln): return m.get_layer(f'block{ln}_conv2').output
vgg_content = Model(vgg_inp, [get_outp(vgg, o) for o in [2,3,4,5]])

vgg1 = vgg_content(vgg_inp)
vgg2 = vgg_content(outp)

def tot_loss(x):
    loss = 0; n = len(style_targs)
    for i in range(n):
        loss += mean_sqr_b(gram_matrix_b(x[i+n]) - gram_matrix_b(style_targs[i])) / 2.
        loss += mean_sqr_b(x[i]-x[i+n]) * w[i]
    return loss

2.Are you didn’t follow the paper’s suggested style layers choice? Since the paper is recommanding [‘block1_conv2’, ‘block2_conv2’, ‘block3_conv3’, ‘block4_conv3’], you chose [‘block2_conv2’, ‘block3_conv2’, ‘block4_conv3’, ‘block5_conv2’], are you choose these layer for reasons? Did these layers perform better?

twairball · May 24, 2017, 1:44pm

I’m unable to edit the original post for some reason, but wanted to add link for the DeVISE paper:

DeVISE - Deep Visual Semantic Embedding model

jeremy · May 25, 2017, 5:33pm

What happens when you try to edit the original post?

twairball · May 26, 2017, 8:57am

I kept getting a weird Disqus error complaining about “Resource not found”. I tried logging in/out. It might be a China-firewall thing (I’m accessing the forums via a proxy).

Sorry to be a nuisance! just wanted to contribute.

piper · July 25, 2017, 9:15am

@jeremy Thanks for wonderful lecture I wonder if ppt file is not allowed to download link is not work…error msg: You don’t have access

Thanks

tham · July 28, 2017, 11:04pm

Thanks for wonderful lecture.

I have some questions about super resolution of the paper “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”.

1 : Where could I find “trn_resized_72_r.bc” and “trn_resized_288_r.bc”?Are they part of the imagenet data set provided in the torrent link of lesson 9?

2 : I find out the axis of K.mean is difficult to understand

def mean_sqr_b(diff): 
    dims = list(range(1,K.ndim(diff)))
    return K.expand_dims(K.sqrt(K.mean(diff**2, dims)), 0)

#result of this line is same as K.mean(diff), but K.mean(diff, dims) return an array
K.mean(diff, dims)

I do some experiments on numpy, the results confuse me(I think K.mean is similar to numpy.mean)

shape = (1,2,2,2)
cimg = np.arange(8)
cimg = cimg.reshape(shape)

print(cimg)

contents of cimg
[[[[0 1]
[2 3]]

[[4 5]
[6 7]]]]

print(np.mean(cimg))
print(np.mean(cimg, (1,2)))
print(np.mean(cimg, (1,2,3)))

results are

3.5
[[ 3.  4.]]
[ 3.5]

What is going on?

3 : In the video of lesson 9, why the target is

targ = np.zeros((arr_hr.shape[0], 128))

but not

targ = np.zeros((arr_hr.shape[0], 144, 144, 128))

Answer : I find out why, because the lost function output 128 loss value.Why 128 but not 1?Maybe 128 is more discriminate. When using 3 output layers, we make the output as single value, because this is easier to calculate.

Edit : By the way, I saw some interesting videos on youtube

Enhance! Super Resolution From Google | Two Minute Papers #124
What is Google RAISR? Google RAISR Software | Smart Upsampling of Photos

I do not know how good they are compare with the super resolution with perceptual loss, I am still trying to understand lesson 9, if everyone know how good/bad it is, please share some comments, thanks

VishnuSubramanian · August 2, 2017, 2:50am

You can find the necessary files for the lesson in http://files.fast.ai/data/

For your 2nd question how np.mean or K.mean works .

Both are going to take dimensions on which it needs to calculate mean. In the lesson we do not use the 0 dimension as it contains batch information.

An explanation for your example :

np.mean(cimg) – > Its going to calculate mean value for all the values in the tensor.
np.mean(cimg, (1,2)) – > It only calculates on the 1, 2 axis seperately, i.e (0+4+2+6)/4 = 3 , (1+5+3+7)/4 = 4
np.mean(cimg, (1,2,3)) --> This example is closer to what we use in the class , except the batch has only one example. To understand what is happening clearly I would recommend you try the below.

shape = (2,2,2,2)
cimg = np.arange(16)
cimg = cimg.reshape(shape)

np.mean(cimg,(1,2,3)) --> Which should give you an output of [3.5,11.5] i.e mean of two examples in the batch.

If its not clear , try to write it on a paper or excel which generally helps me to understand better.

Thanks,
Vishnu

tiger.shen · September 27, 2017, 3:17am

Hi, I have a question about the architecture used in the Neural Style notebook: https://github.com/fastai/courses/blob/master/deeplearning2/neural-style.ipynb

Specifically, in In [40], the first convolutional block uses a filter size of 9x9. This is larger than the filter sizes I’ve come across in other CNNs; is there a particular reason or benefit to this?

Satyaki · March 29, 2019, 8:29am

Hi Jeremy,
I was trying to run the code for superesolution step by step but I encountered an error, “cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88”
I am running this in google colab.
Please help!