Part 2 Lesson 13 Wiki

jeremy · April 24, 2018, 1:12am

Lesson video
Lesson notes from @hiromi

This thread is a wiki - please add any links etc that you think may be useful.

<<< Wiki: Lesson 12 ｜ Wiki: Lesson 14 >>>

Miscellaneous:

GANs

AI and Ethics

Timeline

(0:00:01) Image enhancement
(0:00:40) Deep painterly harmonization paper - Style transfer
(0:01:10) Stochastic weight averaging william horton
(0:02:05) Train Phase API
(0:03:35) Training phase api explanation
(0:03:41) Picture of iterations - step learning rate decay
(0:04:30) Training Phases explanation
(0:05:50) lr decay examples
(0:07:52) Adding your own schedulers - example SGDR
(0:08:22) Example to do 1cycle
(0:08:58) discriminative learning rates
(0:09:23) LARS paper - form of discriminative learning rates
(0:10:05) Customized LR finders
(0:11:10) Change the optimizer
(0:11:50) Change the data during training
(0:12:50) Dawn bench competition for imagenet
(0:15:16) CIFAR result on DAWN bench
(0:17:05) Conv architecture gap - Inception Resnet
(0:19:35) concat in inception
(0:22:43) Basic idea of Inception networks
(0:23:20) Instead of A x A use A x 1 followed by 1xA - Lower rank approximation
(0:27:00) factored convolutions
(0:27:30) Stem in backbone
(0:30:00) Image enhancement paper - Progressive GANs
(0:30:40) Inner network - irrelevant
(0:31:10) Progressive GAN - increase image size
(0:34:02) 1024 images
(0:34:30) Obama fake video
(0:35:30) Questions and Ethics in AI
(0:36:55) Face recognition from various companies
(0:38:40) Women vs. men bias
(0:40:08) Google Translate men vs. women
(0:40:40) Machine learning can amplify bias
(0:42:15) Facebook examples
(0:45:15) Face detection
(0:46:15) meetup.com example men going more
(0:47:50) Bias black vs white
(0:52:46) Responsibilities in hiring
(0:54:07) IBM’s impact on Nazi Germany
(0:56:50) Dropout patent
(0:57:19) Artistic style transfer - Patent
(1:02:08) Code style transfer
(1:07:35) content loss and style loss
(1:11:20) Compare activations - perceptual loss
(1:13:15) Code style transfer
(1:15:25) random image
(1:17:22) Using mid layer activations
(1:19:05) optimizer
(1:20:25) LBFGS optimizer
(1:21:15) LBFGS algorithm works well
(1:21:40) Limited memory optimizer
(1:22:30) Diagram - optimizer explanation how it works
(1:25:05) Keeping track of every step takes lot of step so keep only few gradients
(1:26:52) Code for optimizer
(1:28:16) content loss
(1:29:32) pytorch hooks - forward hooks
(1:31:41) vgg activations
(1:36:42) single precision floating point, half precision
(1:38:22) Pictures from paper
(1:39:35) Create Style loss
(1:38:50) Grab activations of some layer
(1:40:35) Look at painting from wikipedia
(1:41:15) Comparing activations - throw away spatial information
(1:43:00) Dot product of channels - intuition
(1:52:00) save features for all blocks
(1:57:16) Style transfer combined
(2:00:05) Google magenta - music project
(2:01:25) Putting shield in
(2:02:35) probabilistic programming
(2:05:00) Pre-training for generic style transfer
(2:05:40) Pictures from paper
(2:06:45) Maths in the paper

Other tips and resources

For cyclegan notebook

data source: !wget https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/horse2zebra.zip
modify the following code to get start:

opt = TrainOptions().parse(['--dataroot', '/data0/datasets/cyclegan/horse2zebra', '--nThreads', '8', '--no_dropout', '--niter', '100', '--niter_decay', '100', '--name', 'nodrop', '--gpu_ids', '2'])

'--dataroot', '/data0/datasets/cyclegan/horse2zebra': horse2zebra.zip file path
'-–nThreads', '8': lower the no. of threads if kernels die
'–-gpu_ids', '2': set ‘0’ if you only have one GPU

Data for style-transfer notebook, ImageNet Object Detection Challenge

blakewest · April 24, 2018, 1:39am

Can you please discuss intuitions for when you’d use one kind of learning schedule vs. another?

sgugger · April 24, 2018, 1:42am

The customized learning rate finder is in a pull request right now, so you’ll have to wait a bit to use that specific feature.

YangL · April 24, 2018, 1:45am

Can you please explain the best practice for finding the best clr_div,cut_div in clr?

KevinB · April 24, 2018, 1:46am

Great work on those features, that will be awesome to be able to explore.

sgugger · April 24, 2018, 1:47am

You’re welcome! ^^

yggg · April 24, 2018, 1:48am

does anyone has the reference to “concat pooling” trick for the 2nd place cifar10 competition?

ananda_seelan · April 24, 2018, 1:49am

If I’m right, details of “concat pooling” can be found in Jeremy’s paper.

yggg · April 24, 2018, 1:50am

which paper? thanks in advance.

ananda_seelan · April 24, 2018, 1:51am

Fine-tuned Language Models for Text Classification

Basically it means that instead of using the final hidden vector from an RNN, you just concatenate the Max & Avg pool of all the hidden states along with the final hidden vector to be passed on to the next layer.

yggg · April 24, 2018, 1:53am

1x1conv is usually called “network within network” in the literature, what is the intuition of such name?

erinjerri · April 24, 2018, 1:54am

Throwback post by Chris Olah I was just re-reading this weekend that talks about dimensionality reduction: http://colah.github.io/posts/2014-10-Visualizing-MNIST/

I appreciate Jeremy’s attention to visualizing the convnets so that we can understand dimensionality reduction with his drawings/visual aids. So helpful. =)

binga · April 24, 2018, 1:57am

Here you go.

github.com

bkj/basenet/blob/49b2b61e5b9420815c64227c5a10233267c1fb14/examples/cifar10.py#L176


x = self.prep(x.half())


x = self.layers(x)


x_avg = F.adaptive_avg_pool2d(x, (1, 1))
x_avg = x_avg.view(x_avg.size(0), -1)


x_max = F.adaptive_max_pool2d(x, (1, 1))
x_max = x_max.view(x_max.size(0), -1)


x = torch.cat([x_avg, x_max], dim=-1)


x = self.classifier(x)


return x


# --
# Define model


print('cifar10.py: initializing model...', file=sys.stderr)

KevinB · April 24, 2018, 1:58am

@sgugger, do you have a link to the notebook Jeremy went through?

binga · April 24, 2018, 1:59am

@sgugger’s work here: https://github.com/sgugger/Deep-Learning/blob/master/Understanding%20the%20new%20fastai%20API%20for%20scheduling%20training.ipynb

yggg · April 24, 2018, 2:02am

is it safe to say nn.embedding is the same as “low rank approximation”?

Even · April 24, 2018, 2:04am

Interesting. The 5x1 and 1x5 transform seems like the opposite operation from mixture of softmax where we’re trying to increase the rank of the matrices.

I’m surprised that the rank 1 representation is rich enough to represent the problem without degrading performance significantly.

surmenok · April 24, 2018, 2:07am

Progressive GANs paper:

Progressive Growing of GANs for Improved Quality, Stability, and Variation

britt · April 24, 2018, 2:09am

Some interesting Progressive GANs links:

http://research.nvidia.com/publication/2017-10_Progressive-Growing-of

YangL · April 24, 2018, 2:10am

Why is there such a big variation on face type/structure (and hair!!!) when the faces are been GANZed?