Part 2 Lesson 13 Wiki


@jeremy Thank you for the pointer to the NIN paper.

As a quick summary: the paper proposed a new structure called MLPConvBlock to replace “vanilla conv block”, and MLPConvBlock can be efficiently implemented using 1x1conv in the following fashion:

class MlpConvBlock(nn.Module):

    def __init__(self, in_chs, out_chs, **kw):
        h2_chs = (in_chs + out_chs) // 3
        h1_chs = 2 * h2_chs
        self.conv1 = nn.Conv2d(in_chs, h1_chs, kernel_size=3, stride=1, padding=1, **kw)
        self.conv2 = nn.Conv2d(h1_chs, h2_chs, kernel_size=1, stride=1, padding=0)
        self.conv3 = nn.Conv2d(h2_chs, out_chs, kernel_size=1, stride=1, padding=0)
        self.bn1 = nn.BatchNorm2d(h1_chs)
        self.bn2 = nn.BatchNorm2d(h2_chs)
        self.bn3 = nn.BatchNorm2d(out_chs)
        self.relu = nn.ReLU(inplace=True)
    def forward(self, x):
        y = self.relu(self.bn1(self.conv1(x)))
        y = self.relu(self.bn2(self.conv2(y)))
        y = self.relu(self.bn3(self.conv3(y)))
        return y

Since MLPConvBlock itself looks like a mini-network, so a network composed of many MLPConvBlock's can be seen as “Network in network”.

(Jeremy Howard) #151

I’ve added the lesson video to the top post. It’s encoding now and will be live in about 15 mins.

(Chris Palmer) #152

Hi Jeremy

I see this was posted 3 hours ago with an estimate of 15 minutes for video availability - but I have just tried it and there is no video available…

(Sarada Lee) #153

I manged to implement GAN and Cycle GAN in Excel. It is a good way to learn the maths behind them with visualization. :blush:

In GAN, the L2 loss (bottom right corner) works better than L1 loss (top right corner).

In Cycle GAN, the maximum single value in “Deconv filter” (in blue) could be over -53 million although the the values of “Conv Filter” were ranged from -1 to 1.

(Ganesh Krishnan) #154

This is awesome! I’m curious as to how you implement this, though. For the GAN, how did you do the min-max optimization? Especially, having a hard time wrapping my head around how you implemented the discriminator. Did you just choose an arbitrary convolutional filter for the discriminator?

(Ibrahim El-Fayoumi) #155

I can not see the video, I was away this week and trying to view lesson 13 but the video is not available.
would you please help?

(Ken) #156

Hi @Elfayoumi, you can view the unedited version at this link:

@jeremy, it looks like the edited video posted in the wiki section doesn’t play back:

(Sarada Lee) #157

Setup a deconvolutional filter as usual. Then, using “solver” to minimize the sum of the loss function (L1 or L2) by “changing variable cells” (ie the deconvolutional filter).

(Jeremy Howard) #158

Apologies - my computer went to sleep just before the upload completed! Fixed now; the full video is in the top post.

(Igor Kasianenko) #159

I have question about loss function. Jeremy told that numbers were too small and it didn’t learn. But when I see block 84 in Jupyter notebook with *1e6 in the end, I recall school lessons about precision, where teacher told to use big numbers in the beginning of equation. Does this rule apply here, should it work better with better precision, if we rewrite 1e6*, x.t())/input.numel() ?

(Jeremy Howard) #160

That may be better - although in this case the problem didn’t occur until later in the optimization when it calculated the gradient and step size, so it doesn’t really matter.


Does anyone run into this issue when trying to rerun the notebook shown in class? I am stuck at the last part of the style transfer when trying to run the iterations on the comb_loss function. Thanks in advance!

(Alena Harley) #162

Remove line:

for sf in sfs: sf.close()

before style transfer section.


I blended my cat + flowers - pretty fun course!

(Alena Harley) #165

Experimenting with portraits:

(Shivam Goel) #166

Hi everyone,

I have a question regarding the architecture that you guys are using to generate these new pictures. Did anyone try using the ResNet like Jeremy told in the class?

I am struggling to get any good output there. I am yet to try anyother.

Meanwhile i read this:

Style transfer typically requires neural networks with a well-developed hierarchy of features for calculating the loss. Thus for this purpose, the good old vgg-16 and vgg-19 architectures work very well. But inception architectures (unlike resnet architectures) also have the same property.

and i guess it makes sense. Since ResNet uses skip connections to combine different layers, it does not maintain an interpretable coarse- to-fine feature hierarchy from the last layer to early layers as the VGGNet does. Each layer of ResNet contains both semantical abstraction and spatial details from earlier layers.

Let me know if anyone tries any other network.

(Bart Fish) #167

Haven’t run into that issue, but the “standard” solution is to do a

Git pull

(Jeremy Howard) #168

It’s a really interesting issue. I’m not sure I agree with this post’s reasoning however. Because resnet has occassional downsampling layers which don’t have skip connections, it does still have a coarse-to-fine feature hierarchy.

I think the issue may be due to the fully connected layers.

(Vibhutha Kumarage) #169

I created a small blog post about neural style transfer using what I learned from Lesson 13. Please check that out and give me some feedback to improve the content.
Thanks…!! :smiley:

(rkj) #170

Congratulations on the paper published :slight_smile: