Understanding BatchNorm1D, Dropout and RNNDropout()

I have troubles understanding 3 things and would be happy if somebody could help me to understand it. RNN Classifier has the following structure:

  1. The input to BatchNorm1D is 1200, even though the previous layer has a shape (1150,400). Why isn’t it 400? I marked it red. Is it because of Concat Pooling? If so, it would imply the structure below. Can you confirm whether my assumption is correct?

  2. Why do we apply RNNDropout() at the end of a language model? it looks like throwing away predictions for certain possible next words. Could you explain? I don’t understand the reasons for using dropout at the end.

  3. I couldn’t map the dropout from the ULMFiT paper to those dropouts in model’s summary on which I based this visualization. From the paper: “We apply dropout of 0.4 to layers, 0.3 to RNN layers, 0.4 to input embedding layers, 0.05 to embedding layers, and weight dropout of 0.5 to the RNN hidden-to-hidden matrix.” Was it changed in fastai v1 in comaprison to ULMFiT? In particular, where do we apply 0.4 input embedding layer dropout and where 0.05 embedding layer dropout? Cause it seems to me that there is only 1 embedding layer.

Thanks a lot in advance!

1 Like

Hi, the input size of the batchnorm1d is due to the concat-pooling layer. To answer your second question remember that dropout is an indirect form of regularization as it forces the survival neurons to be more confident with theirs choices. And as you should already know it is not used during prediction but just at training time. I cannot help you for the third