I am trying to implement AWD-LSTM and thus would like to ensure I understand the dropout techniques correctly. I have read the article and the docs but still doubt whether I have understood it properly. Please correct me in case I am wrong.
Embedding dropout (embed_p) - probability of substituting the word embedding with zero vector.
Input dropout (input_p) - probability of substituting each component of the chosen embedding vector with 0.
Weight dropout (weight_p) - probability of substituting each weight for all the recurrent matrices with 0.
Hidden (recurrent) Dropout (hidden_p) - probability of substituting each component of the update-vector (obtained after multiplying the result of tanh layer with input gate layer output.
The names are received from here.
Have I given the correct definitions to all the kinds of dropouts? Will be very grateful for any help!