Part 2 Lesson 12 wiki


#34

returns float vs int
5 // 2 = 2
5 / 2 = 2.5


(Debashish Panigrahi) #37

Could you please explain adaptive-avg pooling? How setting to 1 works?


(Bart Fish) #38

nn.DataParallel?


(Ravi Jain) #39

make_group_layer contains stride=2-(i==1).
So this means stride is 1 for layer 1 and 2 for everything else.
Whats the logic behind it? (Usually the strides i have seen used are odd.)


(Erin Pangilinan) #40

Wait was Jeremy referring to what w/ regard to momentum?

Sorry brain not working, I put out a fire in my house this morning and am still catching up.


#42

He was referring to Leslie’s paper here.


(Erin Pangilinan) #43

Ah ya I do remember that from last week, was like wait concept of momentum? I’m just tired I guess.


(nok) #44

So the huge speed up is a combination of 1cycle learning rate and momentum annealling + 8 GPU parallel training and the half precision?

Is that only possible to do the half precision calculation with consumer gpu?

Another question why calculation is 8 times faster from single to half precision while from double to single is only 2 times faster? Thanks.


(Daniel Hunter) #45

Jeremy – does a GAN need a lot more data than say, dogs vs cats or NLP? Or is it comparable? Thanks!


(Erin Pangilinan) #46

Siraj Raval made a whole video on GANs w/ the # of original Pokemon btw: https://www.youtube.com/watch?v=yz6dNf7X7SA


(Vibhutha Kumarage) #47

He made a lot of GAN related videos… Highly recommend to watch…


#48

in ConvBlock -> foward, is there a reason why bn comes after relu?
i.e. self.bn(self.relu(self.conv(x)))

In resnet, the order would be self.relu(self.bn(self.conv(x))).


(chunduri) #49

why do u need a separate initial convblock, from sequential convblock?


(Kevin Bird) #50

Would there be an issue if one of the models was much better than the other one? If one had a very quick training time and the other one was much slower to get to the same point.

I guess my question here is, is it best to make the fastest possible model for both of these? Or is it better to keep these two models training at similar speeds. Basically if one model is always able to trick the other, do you need to dial it back?


#51

the same happens in DeconvBlock


(Ankit Goila) #52

Jeremy mentioned before that they do the same thing, just that he realized only later that nn.Sequential would have been more concise.


(Paras Kaushik) #53

How easy / difficult is it to create a discriminator to identify Fake News against Real News?


(Alex) #54

yes, it will cause either “mode collapse” - generate same image over and over or vanishing gradients depending which one wininig


(Kevin Bird) #55

So you could potentially need to dumb a model down?


(Erin Pangilinan) #56

As reference on deconvolution or transpose convolution: mentioned in latter half of the slides on segmentation and attention in Stanford CS231N (computer vision) with Justin Johnson lecturing:

http://cs231n.stanford.edu/slides/2016/winter1516_lecture13.pdf