Lesson 10 Discussion & Wiki (2019)

Is the reason why activation of 0 is bad is because then the gradients will be 0 since weights are not adjusted and thus we have useless training?

The mean and std of the output of each layer can reflect the training quality?

Is it possible to make GeneralReLU parameters learned by the network? Something like AdaptiveGeneralReLU?

1 Like

It’s just that you are wasting information. You would be better note even computing that activation since it’s useless.

how do you decide which flavor of RELU to use?

1 Like

Try it and tell us what you find :slight_smile:

2 Likes

I think you can.

When should we use a context manager? How is it useful in general?

In reference to this line in the notebook …
Having given an __enter__ and __exit__ method to our Hooks class, we can use it as a context manager.

2 Likes

When you need to run some code to clean up after the code inside is finished. Like closing the file properly after reading lines.

You should use them whenever you have something “temporary” AND you don’t want to forget that do something at the end, e.g. opening a file (and then not forgetting to close it), registering a hook and then not forgetting to deregister it…

1 Like

Any imagenet model that uses Leakyrelu ?

Wasting information as in the result of the calculation doesn’t change any of the weights? Or is the resulting calculation just gives us an activation full of 0s?

Two situations:

  • you want to make sure something is closed when you’re finished (the file you’re reading, your hooks are removed)
  • you are using a temporary substitute for a parameter: for instance you want to change your loss function for a bit, so you want it to change when you enter the context manager and to be put back to its origin when you exit.
4 Likes

By the way, there was a discussion about context managers some time ago.

4 Likes

Do you have a comparison of how the model performed with standard relu vs shifted relu?

1 Like

The first one. A neural net is cramming information into a tiny state. Having too many of those be 0 is a real waste.

1 Like

Off topic but when Jeremy said a thousand layers deep I just can’t help https://www.youtube.com/watch?v=46cSksKVzzs

1 Like

Do people use kaiming normalization to initialize the “mults” for batch normalization?

1 Like

No that’s not what Kaiming init is for: Kaiming is to use the right scale for linear or conv layers.

1 Like

What do mom and eps stand for?