Part 2 Lesson 12 wiki

Ask your questions here. This is a wiki.

<<< Wiki: Lesson 11Wiki: Lesson 13 >>>

Resources:

Papers:

Timeline

  • (0:00:00) GANs
  • (0:01:15) Medicine language model
  • (0:03:00) Ethical issues
  • (0:04:35) Should I share my interesting ideas?
  • (0:05:28) Talking about CIFAR 10
  • (0:07:22) About notebook
  • (0:08:05) About CIFAR10 dataset
  • (0:09:07) exercise for understanding cifar 10 average and standard deviation
  • (0:09:50) Batch size, transforms, padding (not black padding instead reflection)
  • (0:11:07) Architecture
  • (0:11:52) Architecture - stacked, hierarchy of layers
  • (0:12:42) Leaky relu
  • (0:14:06) Sequential instead of Pytorch module
  • (0:14:44) Resnet Hierarchy
  • (0:16:55) Number of channels
  • (0:17:55) In place true Leaky ReLU, in place in conv layers, Dropouts, activation, arithmetic
  • (0:19:53) Bias set to false in conv layers
  • (0:21:12) conv layer padding
  • (0:22:13) bottleneck in conv layer
  • (0:26:10) Wide resnet
  • (0:28:25) papers that talk about architectures
  • (0:29:45) SeLU
  • (0:31:40) Darknet module
  • (0:35:00) Adaptive average pooling
  • (0:37:34) Dawn bench test
  • (0:38:00) Parameters of python script with AWS p3
  • (0:40:37) Adaptive average pooling explanation
  • (0:58:40) Discrimative GAN code
  • (0:59:50) data required for GAN - no answer
  • (1:01:10) huge speed up reason - Nvidia GPU
  • (1:03:40) Neural Net - inputs and outputs
  • (1:05:30) Discrimantor if generator was present
  • (1:06:05) Generator - prior (random numbers)
  • (1:07:55) BatchNorm before relu order
  • (1:09:40) Back to generator - DeConvolution
  • (1:10:27) De conv in excel
  • (1:13:45) Discriminator for fake news
  • (1:16:02) Conv Transpose 2D
  • (1:16:37) Theano website example animation
  • (1:18:20) Unsample vs conv transpose 2D
  • (1:22:30) DeConv series to make it bigger and bigger
  • (1:23:41) Training Data
  • (1:25:50) Noise Vector
  • (1:26:40) Optimizers
  • (1:27:18) Training Loop
  • (1:29:29) Gan Loss and Process
  • (1:31:00) Discriminator Loss Mathematics
  • (1:34:42) WGAN Parameter Clipping
  • (1:36:20) Gan Execution and Generator Loss
  • (1:38:47) Training the Discriminator more often
  • (1:40:20) Generating data
  • (1:42:06) Overfitting + Evaluation of GANs
  • (1:44:48) CycleGAN concept
  • (1:45:34) Synthetic Data as a Training Set
  • (1:47:50) CycleGAN Explanation and Demonstration
  • (1:48:40) Artistic GANs
  • (1:50:14) CycleGAN Intuition
  • (1:54:05) CycleGAN loss Math
  • (1:55:55) GAN Translation Question
  • (1:58:56) Full Loss Function
  • (2:01:00) CycleGAN Code
  • (2:03:51) CycleGAN Dataloader
  • (2:06:30) CycleGAN Model
  • (2:09:35) CycleGAN Training Loop
  • (2:11:18) CycleGAN Optimizers
  • (2:15:30) Examples
8 Likes

Why is bias usually (like in resnet) set to False in conv_layer?

4 Likes

Why LeakyReLU instead of SELU?

2 Likes

Do you have a link to SELU?

Maybe because of batchnorm? I vaguely remember from Andrew Ng’s class that the beta parameter in batchnorm practically replaces bias.

5 Likes

Why is inplace=True in the Leaky-Relu?

5 Likes

@rachel Is there any benefit to drawing the architecture? Like as in a line-drawing?

Are there good packages that can illustrate a pre-defined architecture?

2 Likes

Here’s a blog on SELU: https://towardsdatascience.com/selu-make-fnns-great-again-snn-8d61526802a9

I guess another way of asking the question is, “SELU looked really hot in the paper which came out, but I notice that you don’t use it. What’s your opinion on SELU?”

4 Likes

Ah, thank you, make sense! It’d be great if we can get a confirmation from @jeremy

1 Like

Good call!

1 Like

Sorry – where’s the link?

Isn’t a small network that just narrows and re-expands called an “encapsulation network”? @rachel?

Is the reason for squishing it down and expanding the same idea as U-Net?

Would be curious about Jeremy’s answer, but here’s my take: the SELU paper focuses on standard dense neural nets as opposed to CNNs or RNNs. I am not 100% sure why but I did try SELU on CNNs once and got poor results, so it’s possible that it simply doesn’t work that well on CNNs. Why? No idea…

So this should be applicable to simple fully connected linear layers as well, right?

Why does ks//2 have 2 forward slashes?

// converts to int

2 Likes

Python has two division operators, a single slash character for classic division and a double-slash for “floor” division (rounds down to nearest whole number). Classic division means that if the operands are both integers, it will perform floor division, while for floating point numbers, it represents true division.

http://www.informit.com/articles/article.aspx?p=674692&seqNum=4

2 Likes

integer division

1 Like

Integer division!