Dropouts in AWD-LSTM

I am trying to implement AWD-LSTM and thus would like to ensure I understand the dropout techniques correctly. I have read the article and the docs but still doubt whether I have understood it properly. Please correct me in case I am wrong.

Embedding dropout (embed_p) - probability of substituting the word embedding with zero vector.

Input dropout (input_p) - probability of substituting each component of the chosen embedding vector with 0.

Weight dropout (weight_p) - probability of substituting each weight for all the recurrent matrices with 0.

Hidden (recurrent) Dropout (hidden_p) - probability of substituting each component of the update-vector (obtained after multiplying the result of tanh layer with input gate layer output.

The names are received from here.

Have I given the correct definitions to all the kinds of dropouts? Will be very grateful for any help!

I did a writeup at link below that covers at a high level the different dropouts applied in AWD-LSTM

1 Like

Adrian, thank you! I have made the written-above deductions exactly from your article:) Just wanted to assure I have understood them the proper way.