When updating weights, we need to get the value of Loss partial guide to the convolution kernel .
The formula is like this:
My question is that, as there is the derivative of the activation function ,
so if there are too many activation function layers in a model, the index of the activation function derivative in the expression will increase, especially the node in front of the model. (In fact, the increase in index is reflected in .)
Eventually it is easy to cause gradient explosion/disappear.
(Assumed activation function derivative )
Is my understanding correct?