Lesson 6 In-Class Discussion ✅

gamino · November 28, 2018, 3:14am

@rachel do you mean if <.5 will be zero and 1 if p>.5? Otherwise both read p=.5

lesscomfortable · November 28, 2018, 3:14am

Ctrl+b, z

PierreO · November 28, 2018, 3:14am

Right, so instead of reducing the weights at test time you make them bigger at training time if I understood your explanation ?

iyersathya · November 28, 2018, 3:15am

isn’t cat columns already sparse, applying dropout may have a risk of not learning enough for those embedding.

rachel · November 28, 2018, 3:15am

I was saying that it will be 0 with probability 50% (which is equivalent to choosing a u~uniform(0,1) and checking if u < .5, which is what you are saying now).

sgugger · November 28, 2018, 3:17am

Yes. In pytorch (unless I remember wrong) dropout doesn’t do anything to your weights at all during test time.

devforfu · November 28, 2018, 3:18am

In what proportion would you use dropout vs. other regularization errors, like, weight decay, L2 norms, etc.?

sgugger · November 28, 2018, 3:19am

Balancing them all properly is all the art of a good practitioner. Sadly we don’t have any guidance on that except get your intuition on it.

champs.jaideep · November 28, 2018, 3:19am

can any one explain emb drop out… is it different from term normal drop out… .i missed on this part…

jcatanza · November 28, 2018, 3:19am

Use RMSE if you want to minimize absolute error.
Use RMSPE if you want to minimize fractional error.

rachel · November 28, 2018, 3:20am

In many cases, we are using whichever loss was used by the Kaggle competition or academic benchmark that we are comparing our results against.

lesscomfortable · November 28, 2018, 3:21am

It just drops out some activations of the embedding. Remember that you can treat an embedding like a multiplication of an embedding matrix times a matrix with one hot encoded vectors of ones. So after getting the embedding, you loose some of the embedding values for each feature.

jcatanza · November 28, 2018, 3:23am

In a two-class problem, say classification of cat or dog,
our model will assign a probability of being a cat (Jeremy calls it “cattyness”)
and a probability of being a dog (“doggyness”). Of course since there are only two classes, the class label is either 1 for cat or 0 for dog. And cattyness = 1 - doggyness.

champs.jaideep · November 28, 2018, 3:24am

thanku
how different is it from normal drop out term… p

nithanaroy · November 28, 2018, 3:24am

Thank you. Any intuitions (like in what situations) when minimizing absolute is better than percentage and vice versa?

lesscomfortable · November 28, 2018, 3:24am

It applies to the embeddings instead of the hidden layers. The difference is in which layers are affected.

jcatanza · November 28, 2018, 3:26am

I’m not so sure that dropping inputs is a bad idea! It is akin to building trees from random subsets of the samples in Random Forest.

hiromi · November 28, 2018, 3:27am

Could we get a quick overview of weight norm? (I think that’s what Jeremy said)

jcatanza · November 28, 2018, 3:28am

Yes, you only need to use dropouts to regularize your model.

sgugger · November 28, 2018, 3:28am

That’s advanced (which is why he didn’t go through the explanation)