Part 2 Lesson 12 wiki

yggg · April 17, 2018, 2:03am

returns float vs int
5 // 2 = 2
5 / 2 = 2.5

Deb · April 17, 2018, 2:09am

Could you please explain adaptive-avg pooling? How setting to 1 works?

Interogativ · April 17, 2018, 2:10am

nn.DataParallel?

ravijain · April 17, 2018, 2:11am

make_group_layer contains stride=2-(i==1).
So this means stride is 1 for layer 1 and 2 for everything else.
Whats the logic behind it? (Usually the strides i have seen used are odd.)

erinjerri · April 17, 2018, 2:20am

Wait was Jeremy referring to what w/ regard to momentum?

Sorry brain not working, I put out a fire in my house this morning and am still catching up.

sgugger · April 17, 2018, 2:22am

He was referring to Leslie’s paper here.

erinjerri · April 17, 2018, 2:23am

Ah ya I do remember that from last week, was like wait concept of momentum? I’m just tired I guess.

nok · April 17, 2018, 2:28am

So the huge speed up is a combination of 1cycle learning rate and momentum annealling + 8 GPU parallel training and the half precision?

Is that only possible to do the half precision calculation with consumer gpu?

Another question why calculation is 8 times faster from single to half precision while from double to single is only 2 times faster? Thanks.

danielhunter · April 17, 2018, 2:37am

Jeremy – does a GAN need a lot more data than say, dogs vs cats or NLP? Or is it comparable? Thanks!

erinjerri · April 17, 2018, 2:42am

Siraj Raval made a whole video on GANs w/ the # of original Pokemon btw: https://www.youtube.com/watch?v=yz6dNf7X7SA

Bodhi94 · April 17, 2018, 2:43am

He made a lot of GAN related videos… Highly recommend to watch…

yggg · April 17, 2018, 2:44am

in ConvBlock -> foward, is there a reason why bn comes after relu?
i.e. self.bn(self.relu(self.conv(x)))

In resnet, the order would be self.relu(self.bn(self.conv(x))).

chunduri · April 17, 2018, 2:44am

why do u need a separate initial convblock, from sequential convblock?

KevinB · April 17, 2018, 2:46am

Would there be an issue if one of the models was much better than the other one? If one had a very quick training time and the other one was much slower to get to the same point.

I guess my question here is, is it best to make the fastest possible model for both of these? Or is it better to keep these two models training at similar speeds. Basically if one model is always able to trick the other, do you need to dial it back?

yggg · April 17, 2018, 2:48am

the same happens in DeconvBlock

A_TF57 · April 17, 2018, 2:49am

Jeremy mentioned before that they do the same thing, just that he realized only later that nn.Sequential would have been more concise.

Paras · April 17, 2018, 2:49am

How easy / difficult is it to create a discriminator to identify Fake News against Real News?

tensoralex · April 17, 2018, 2:50am

yes, it will cause either “mode collapse” - generate same image over and over or vanishing gradients depending which one wininig

KevinB · April 17, 2018, 2:51am

So you could potentially need to dumb a model down?

erinjerri · April 17, 2018, 2:53am

As reference on deconvolution or transpose convolution: mentioned in latter half of the slides on segmentation and attention in Stanford CS231N (computer vision) with Justin Johnson lecturing:

http://cs231n.stanford.edu/slides/2016/winter1516_lecture13.pdf