Lesson 8 - Official topic

jcatanza · May 6, 2020, 3:38am

Does “deleting an activation” mean setting it to zero?

sgugger · May 6, 2020, 3:38am

It just sets it to zero.

ilovescience · May 6, 2020, 3:39am

Hinton’s intuitions of dropout (he has two reason behind it ):

Dina · May 6, 2020, 3:41am

For dropouts, if the unit was set to zero during training, what weights is being used during test?

sgugger · May 6, 2020, 3:42am

Dropout is only applied during training.

Dina · May 6, 2020, 3:43am

ok, but then for testing, which weights does it use if they were set to zero?

sgugger · May 6, 2020, 3:43am

No weights were set to zero. Dropout is applied to activations.

jwuphysics · May 6, 2020, 3:43am

A general version of dropout was also proposed much earlier (Hanson 1990) but is rarely cited.

You can find many more instances of these “we or someone else did it first” claims on Schmidhuber’s blog, and here’s a rebuttal from Hinton.

EDIT: rather than adding another post I’m responding here I’m inclined to agree that it’s a bit of a stretch (in addition to some of the other claims on that blog post). However, Hanson did also co-author another paper in 2018: Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning.

ilovescience · May 6, 2020, 3:44am

Thanks for sharing. It seems more and more that Schmidhuber and his team or other groups did everything already but that’s a discussion for another day!

nareshr8 · May 6, 2020, 3:44am

Will back propagation not update weights twice as it is referred twice in the same network?

quantum · May 6, 2020, 3:45am

Why does NLP not use CNNs? Is it because we can’t ‘label’ a body of text?

ilovescience · May 6, 2020, 3:45am

Here is the questionnaire for chapter 12. It is still a work in progress. Feel free to contribute!:

init_27 · May 6, 2020, 3:46am

End of part 1

gamino · May 6, 2020, 3:46am

Thank you for pulling off an amazing class under historical circumstances! Thank you, Rachel, Sylvain, and Jeremy! Thank you, Thank you, Thank you!! And thank you for making us wear masks!

jcatanza · May 6, 2020, 3:46am

Hanson’s paper seems to be about adding stochastic noise to improve convergence.

radikubwa · May 6, 2020, 3:47am

You have change the dimensionality in CNN to a 1D CNN. https://pytorch.org/docs/stable/nn.html. 1D CNNs capture the order of the text as well. I have seen them as way of prepossessing.

pinaki · May 6, 2020, 3:47am

Are there pointers for easier research papers to implement for DL starters ?

init_27 · May 6, 2020, 3:48am

I’ve also created a Wiki: 2020 ML Interviews Resources & Advice(s), please contribute!

marii · May 6, 2020, 3:48am

Definitely suggest kaggle competitions. They can have data separated in other ways besides random split.

Good to get in project groups as well.

Albertotono · May 6, 2020, 3:48am

Will be released a certification, or something to rember this great journey, signed by Jeremy?