I just saw that in most cases when you have missing values in a dataset on which you want to train a neural network you can simply replace the missing values by 0, the nn will then automatically learn that this value mean “missing data”.
In the case of Dropout the idea is to “drop” some connexions of the neural net by zeroing a random fraction of the tensors.
So I was wondering: How could the neural net distinguish missing values and “dropout”? Or we simply don’t need the neural net to do the distinction?
For the first dense layer, I think the effect is similar except that dropout is stochastic (i.e. random) where missing values are likely not. After the first dense layer, however, missing values will not cause subsequent activations to ‘drop out’ in the feed-forward pass of the data and so dropout would be required to get this desired effect.