Part 2 Lesson 12 wiki

rachel · April 17, 2018, 1:14am

Ask your questions here. This is a wiki.

<<< Wiki: Lesson 11 ｜ Wiki: Lesson 13 >>>

Lesson video
Lesson notes from @hiromi

Resources:

PyTorch implementation of CycleGAN: Zebra to Horse and vice versa
GAN Artist (fastAI student): https://twitter.com/glagolista
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

Papers:

Wasserstein GAN: https://arxiv.org/abs/1701.07875
Multimodal Unsupervised Image-to-Image Translation: https://arxiv.org/abs/1804.04732
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks: https://arxiv.org/abs/1703.10593
Deconvolution and Checkerboard Artifacts
Wide Residual Networks
One pixel attack for fooling deep neural networks

Timeline

(0:00:00) GANs
(0:01:15) Medicine language model
(0:03:00) Ethical issues
(0:04:35) Should I share my interesting ideas?
(0:05:28) Talking about CIFAR 10
(0:07:22) About notebook
(0:08:05) About CIFAR10 dataset
(0:09:07) exercise for understanding cifar 10 average and standard deviation
(0:09:50) Batch size, transforms, padding (not black padding instead reflection)
(0:11:07) Architecture
(0:11:52) Architecture - stacked, hierarchy of layers
(0:12:42) Leaky relu
(0:14:06) Sequential instead of Pytorch module
(0:14:44) Resnet Hierarchy
(0:16:55) Number of channels
(0:17:55) In place true Leaky ReLU, in place in conv layers, Dropouts, activation, arithmetic
(0:19:53) Bias set to false in conv layers
(0:21:12) conv layer padding
(0:22:13) bottleneck in conv layer
(0:26:10) Wide resnet
(0:28:25) papers that talk about architectures
(0:29:45) SeLU
(0:31:40) Darknet module
(0:35:00) Adaptive average pooling
(0:37:34) Dawn bench test
(0:38:00) Parameters of python script with AWS p3
(0:40:37) Adaptive average pooling explanation
(0:58:40) Discrimative GAN code
(0:59:50) data required for GAN - no answer
(1:01:10) huge speed up reason - Nvidia GPU
(1:03:40) Neural Net - inputs and outputs
(1:05:30) Discrimantor if generator was present
(1:06:05) Generator - prior (random numbers)
(1:07:55) BatchNorm before relu order
(1:09:40) Back to generator - DeConvolution
(1:10:27) De conv in excel
(1:13:45) Discriminator for fake news
(1:16:02) Conv Transpose 2D
(1:16:37) Theano website example animation
(1:18:20) Unsample vs conv transpose 2D
(1:22:30) DeConv series to make it bigger and bigger
(1:23:41) Training Data
(1:25:50) Noise Vector
(1:26:40) Optimizers
(1:27:18) Training Loop
(1:29:29) Gan Loss and Process
(1:31:00) Discriminator Loss Mathematics
(1:34:42) WGAN Parameter Clipping
(1:36:20) Gan Execution and Generator Loss
(1:38:47) Training the Discriminator more often
(1:40:20) Generating data
(1:42:06) Overfitting + Evaluation of GANs
(1:44:48) CycleGAN concept
(1:45:34) Synthetic Data as a Training Set
(1:47:50) CycleGAN Explanation and Demonstration
(1:48:40) Artistic GANs
(1:50:14) CycleGAN Intuition
(1:54:05) CycleGAN loss Math
(1:55:55) GAN Translation Question
(1:58:56) Full Loss Function
(2:01:00) CycleGAN Code
(2:03:51) CycleGAN Dataloader
(2:06:30) CycleGAN Model
(2:09:35) CycleGAN Training Loop
(2:11:18) CycleGAN Optimizers
(2:15:30) Examples

yggg · April 17, 2018, 1:46am

Why is bias usually (like in resnet) set to False in conv_layer?

Ducky · April 17, 2018, 1:47am

Why LeakyReLU instead of SELU?

KevinB · April 17, 2018, 1:47am

Do you have a link to SELU?

arminouri · April 17, 2018, 1:48am

Maybe because of batchnorm? I vaguely remember from Andrew Ng’s class that the beta parameter in batchnorm practically replaces bias.

hamelsmu · April 17, 2018, 1:49am

Why is inplace=True in the Leaky-Relu?

bhollan · April 17, 2018, 1:49am

@rachel Is there any benefit to drawing the architecture? Like as in a line-drawing?

Are there good packages that can illustrate a pre-defined architecture?

Ducky · April 17, 2018, 1:49am

Here’s a blog on SELU: https://towardsdatascience.com/selu-make-fnns-great-again-snn-8d61526802a9

I guess another way of asking the question is, “SELU looked really hot in the paper which came out, but I notice that you don’t use it. What’s your opinion on SELU?”

yggg · April 17, 2018, 1:51am

Ah, thank you, make sense! It’d be great if we can get a confirmation from @jeremy

danielhunter · April 17, 2018, 1:53am

Good call!

danielhunter · April 17, 2018, 1:54am

Sorry – where’s the link?

bhollan · April 17, 2018, 1:56am

Isn’t a small network that just narrows and re-expands called an “encapsulation network”? @rachel?

KevinB · April 17, 2018, 1:57am

Is the reason for squishing it down and expanding the same idea as U-Net?

AdrienLE · April 17, 2018, 1:57am

Would be curious about Jeremy’s answer, but here’s my take: the SELU paper focuses on standard dense neural nets as opposed to CNNs or RNNs. I am not 100% sure why but I did try SELU on CNNs once and got poor results, so it’s possible that it simply doesn’t work that well on CNNs. Why? No idea…

ravijain · April 17, 2018, 2:01am

So this should be applicable to simple fully connected linear layers as well, right?

musedivision · April 17, 2018, 2:02am

Why does ks//2 have 2 forward slashes?

Deb · April 17, 2018, 2:02am

// converts to int

KevinB · April 17, 2018, 2:02am

Python has two division operators, a single slash character for classic division and a double-slash for “floor” division (rounds down to nearest whole number). Classic division means that if the operands are both integers, it will perform floor division, while for floating point numbers, it represents true division.

http://www.informit.com/articles/article.aspx?p=674692&seqNum=4

Interogativ · April 17, 2018, 2:03am

integer division

ravijain · April 17, 2018, 2:03am

Integer division!