Lesson 4 In-Class Discussion

init_27 · November 21, 2017, 2:43am

Removing the Learned weights in such a way that the calculation works. But overfitting is avoided.

yinterian · November 21, 2017, 2:43am

activations don’t have weights.

A_TF57 · November 21, 2017, 2:43am

Would adding dropout make training slower?

KevinB · November 21, 2017, 2:43am

any idea what the “ps” stands for?

ecdrid · November 21, 2017, 2:44am

The abstract of the dropout article seems perfectly servicable.

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
`

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research, 2014

init_27 · November 21, 2017, 2:44am

I guess number of epochs remain the same. So, no.

lgvaz · November 21, 2017, 2:44am

probabilities?

hiromi · November 21, 2017, 2:45am

percentages?

ecdrid · November 21, 2017, 2:45am

What happens at test time?

yinterian · November 21, 2017, 2:45am

I think it is p for one but you have many so that is why ps.

KevinB · November 21, 2017, 2:46am

Yep, multiple P = Ps. Thanks!

nafizh · November 21, 2017, 2:46am

Do you need dropout if you are doing batch normalization?

yinterian · November 21, 2017, 2:47am

Yes, these are different techniques.

anandsaha · November 21, 2017, 2:47am

What are the recommended value(s) to be set to the dropout probability?

rounakmehta · November 21, 2017, 2:48am

Can a dropout layer be placed anywhere in the network or can it only succeed a BatchNorm layer?

yinterian · November 21, 2017, 2:48am

It depends on your problem. These are hyperparameters.

A_TF57 · November 21, 2017, 2:48am

Kind of a basic question: is the last column in learn.fit accuracy on the training set or the validation set?

nafizh · November 21, 2017, 2:48am

Yes, but it seems in the current literature many people are using either dropout or batch normalization. I am wondering what is the trade-off between these two techniques as batchnorm also seems to claim to help generalization.

init_27 · November 21, 2017, 2:48am

Higher the value, better the generalisation but lesser the accuracy.
Lower the value, lesser the generalisation but better the accuracy.

KevinB · November 21, 2017, 2:49am

the more of each image you have the lower your dropout in general?