Part 2 Lesson 12 wiki

(Even Oldridge) #213

I’m hoping there’ll be a chance to cover this on Monday since we didn’t get to it last week. This is one of the key missing pieces for me in terms of scaling training.

(Jeremy Howard) #214

Yes we will. Sorry about the delay!

(Even Oldridge) #215

No worries! I’m always amazed by how much we cover, and that there’s the chance to ask questions and interact along the way. I’m also very grateful for all of the help you’ve provided. I’m sad the course is ending, although I’ve got a page worth of possible projects that should keep me busy until next year.

(Jason McGhee) #216

Did you make any headway? I’d be very interested in an update :slight_smile:

(Kaitlin Duck Sherwood) #217

There was a paper at ICLR which talked about GAN convergence, and said that if you normalize the weights by dividing by the spectral norm, it converged a lot better.

(Chris Palmer) #218

I’ve tried this with 0.4 - installed using conda install, and it still fails. Jeremy must have been compiling from source when he was experimenting with 0.4, so is it possible that the in-place worked for him because of that - that the code was specifically compiled for his GPU?

(Ken) #219

Interesting. I still haven’t gotten to updating to 0.4, so thanks for the heads up.

(Pranjal Yadav) #220

I’ll definitely share the update as soon as I’m done. I’m currently stretching a lot at work and not finding much time. Sorry for the delay.

In the meantime I can share a new thing I came across

It can help us learn and create at the same time. You can play around and share something new you come up with.

(Jason McGhee) #221

Yeah… I’ve seen that.

It looks really nice, visually.

Unfortunately it’s only for images and raw vector data, so you’d need python to do preprocessing on data to begin with and I’m not convinced losing pytorch, in addition to not being able to use your own hardware would be worth it putting everything else aside.

Looks great for beginners wanting to do deep learning for images though!

(Michael Moret) #222

Yep, which also means we have to be careful to denormalize our generated data accordingly.
I also tried to find the reason why using tanh instead of sigmoid, but no success yet (the DCGAN paper you refer to just mention that bounded activations are better, but it’s also the case if we use the sigmoid which is bounded between 0 and 1).
The only reason I can think of is that it works better experimentally on some dataset, maybe because the activations are close to zero, which means that the sigmoid will get saturated, which won’t be the case for tanh as it is define between -1 and 1.

(Sebastien Derhy) #223

Is there any particular reason why dropout is not used in the Darknet notebook? The training loss is significantly lower than the validation loss at the end, so I would expect dropout to help…

More generally, are there cases where dropout shouldn’t be used? I thought that if we take a network without dropout, make each layer x% wider, then add 1/(1+x%) dropout probability, we should get better results. Is this true?



My intuition would be, because batchnorm is used, and it requires some additional work to make it work together.
But people had some success using dropout with batchnorm on Cifar10

This is a paper discusing it:
Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

As @jeremy said, there is a room for experimentation in deep learning :wink:

(tester) #225

I am trying to understand cycle gan, and reading the source code. I dont understand why their generator model has so many of these layers (see bottom) compare to the one we got in wgan. I also couldn’t find things about this on their paper. Does anyone know??

Input size is one factor from the loop, but the layers are more than I would expect.

  (conv_block): Sequential(
    (0): ReflectionPad2d((1, 1, 1, 1))
    (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
    (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (3): ReLU(inplace)
    (4): ReflectionPad2d((1, 1, 1, 1))
    (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
    (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)

Is this generator model code more similar to super resolution model we see in lesson 13 as input and output of the model is also an image.