Language Model Dropouts

I’m looking at the dropouts from the Language model and I’m wondering a lot of things about the different dropouts. Apologies for the messy picture!

1: LSTM Weight Dropout: Removes a percentage of the LSTM weights.
2: Hidden Dropout: Applies to the activations being passed from one LSTM to the next.
3: Input Dropout: ?? I guess it looks like it removes some values from the weights coming out of the LSTMs. I always assumed the Input Dropout would be the first thing that happens in the model
4: Not sure, what this dropout is. It is created with dropoouth though which I thought was only for the weights passed between LSTMs. self.dropouths = nn.ModuleList([LockedDropout(dropouth) for l in range(nlayers)])
5: Normal Dropout: comes after the Linear layer of the Linear Decoder. I don’t think this is very effective without another layer at the end, but maybe I’m wrong.
6: Embedding Dropouts: used to “forget” embeddings of words when training.

2 Likes

These are all from the AWD LSTM paper - so take a look there for details.

2 Likes