Lesson 4 In-Class Discussion

(Maureen Metzger) #7



(Brendan Herger) #8

Are the links he’s showing at the beginning posted anywhere?

(Brian Holland) #9

Link for Vitaly’s Article.

(Maureen Metzger) #10

@hergertarian, not yet, he’ll probably post up top after class?

(Kevin Bird) #11

anybody can actually put them up top since it’s a wiki.

(James Requa) #12

All the blog posts are found in this thread

(James Dietle) #13

is there a plugin showing those nice green completion bars in Jeremy’s notebook?

(Sanyam Bhutani) #14

It happens by default for me in the AWS AMI.

(Kevin Bird) #15

What is ps = 0.5?

(Aditya) #16


(Tom Grek) #17

“Throw away” --> set activation to zero? (Or to NaN, null, undefined… and handle it in some special way?)

(Stathis Fotiadis) #18

What does dropping a unit mean? Set it’s activation to 0? What happens in the back propagation step? Does the unit get it’s weights updated?

(satish) #19

It’s a percentage for drop out

(Hiromi Suenaga) #20

If you drop some of the activations, would their weights get updated at the end of the mini batch?

(Sanyam Bhutani) #21

Removing the Learned weights in such a way that the calculation works. But overfitting is avoided.

(yinterian) #22

activations don’t have weights.

(Ankit Goila) #23

Would adding dropout make training slower?

(Kevin Bird) #24

any idea what the “ps” stands for?

(Aditya) #25

The abstract of the dropout article seems perfectly servicable.

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research, 2014

(Sanyam Bhutani) #26

I guess number of epochs remain the same. So, no.