Lesson 10 Discussion & Wiki (2019)

lucaslooper · April 4, 2019, 3:22am

Is the reason why activation of 0 is bad is because then the gradients will be 0 since weights are not adjusted and thus we have useless training?

charming · April 4, 2019, 3:23am

The mean and std of the output of each layer can reflect the training quality?

maxim.pechyonkin · April 4, 2019, 3:23am

Is it possible to make GeneralReLU parameters learned by the network? Something like AdaptiveGeneralReLU?

sgugger · April 4, 2019, 3:23am

It’s just that you are wasting information. You would be better note even computing that activation since it’s useless.

tanyaroosta · April 4, 2019, 3:23am

how do you decide which flavor of RELU to use?

sgugger · April 4, 2019, 3:23am

Try it and tell us what you find

tanyaroosta · April 4, 2019, 3:24am

I think you can.

SHAR1 · April 4, 2019, 3:25am

When should we use a context manager? How is it useful in general?

In reference to this line in the notebook …
Having given an __enter__ and __exit__ method to our Hooks class, we can use it as a context manager.

maxim.pechyonkin · April 4, 2019, 3:26am

When you need to run some code to clean up after the code inside is finished. Like closing the file properly after reading lines.

mediocrates · April 4, 2019, 3:26am

You should use them whenever you have something “temporary” AND you don’t want to forget that do something at the end, e.g. opening a file (and then not forgetting to close it), registering a hook and then not forgetting to deregister it…

champs.jaideep · April 4, 2019, 3:26am

Any imagenet model that uses Leakyrelu ?

lucaslooper · April 4, 2019, 3:26am

Wasting information as in the result of the calculation doesn’t change any of the weights? Or is the resulting calculation just gives us an activation full of 0s?

sgugger · April 4, 2019, 3:27am

Two situations:

you want to make sure something is closed when you’re finished (the file you’re reading, your hooks are removed)
you are using a temporary substitute for a parameter: for instance you want to change your loss function for a bit, so you want it to change when you enter the context manager and to be put back to its origin when you exit.

devforfu · April 4, 2019, 3:28am

By the way, there was a discussion about context managers some time ago.

mkolodny · April 4, 2019, 3:28am

Do you have a comparison of how the model performed with standard relu vs shifted relu?

sgugger · April 4, 2019, 3:29am

The first one. A neural net is cramming information into a tiny state. Having too many of those be 0 is a real waste.

knguyen · April 4, 2019, 3:30am

Off topic but when Jeremy said a thousand layers deep I just can’t help https://www.youtube.com/watch?v=46cSksKVzzs

mkolodny · April 4, 2019, 3:33am

Do people use kaiming normalization to initialize the “mults” for batch normalization?

sgugger · April 4, 2019, 3:34am

No that’s not what Kaiming init is for: Kaiming is to use the right scale for linear or conv layers.

maxim.pechyonkin · April 4, 2019, 3:34am

What do mom and eps stand for?