Lesson 6 In-Class Discussion ✅

@rachel do you mean if <.5 will be zero and 1 if p>.5? Otherwise both read p=.5

Ctrl+b, z

4 Likes

Right, so instead of reducing the weights at test time you make them bigger at training time if I understood your explanation ?

isn’t cat columns already sparse, applying dropout may have a risk of not learning enough for those embedding.

1 Like

I was saying that it will be 0 with probability 50% (which is equivalent to choosing a u~uniform(0,1) and checking if u < .5, which is what you are saying now).

2 Likes

Yes. In pytorch (unless I remember wrong) dropout doesn’t do anything to your weights at all during test time.

3 Likes

In what proportion would you use dropout vs. other regularization errors, like, weight decay, L2 norms, etc.?

10 Likes

Balancing them all properly is all the art of a good practitioner. Sadly we don’t have any guidance on that except get your intuition on it.

3 Likes

can any one explain emb drop out… is it different from term normal drop out… .i missed on this part…

Use RMSE if you want to minimize absolute error.
Use RMSPE if you want to minimize fractional error.

4 Likes

In many cases, we are using whichever loss was used by the Kaggle competition or academic benchmark that we are comparing our results against.

1 Like

It just drops out some activations of the embedding. Remember that you can treat an embedding like a multiplication of an embedding matrix times a matrix with one hot encoded vectors of ones. So after getting the embedding, you loose some of the embedding values for each feature.

In a two-class problem, say classification of cat or dog,
our model will assign a probability of being a cat (Jeremy calls it “cattyness”)
and a probability of being a dog (“doggyness”). Of course since there are only two classes, the class label is either 1 for cat or 0 for dog. And cattyness = 1 - doggyness.

3 Likes

thanku
how different is it from normal drop out term… p

Thank you. Any intuitions (like in what situations) when minimizing absolute is better than percentage and vice versa?

It applies to the embeddings instead of the hidden layers. The difference is in which layers are affected.

I’m not so sure that dropping inputs is a bad idea! It is akin to building trees from random subsets of the samples in Random Forest.

1 Like

Could we get a quick overview of weight norm? (I think that’s what Jeremy said)

9 Likes

Yes, you only need to use dropouts to regularize your model.

That’s advanced :slight_smile: (which is why he didn’t go through the explanation)

2 Likes