In the lesson 3 video, someone asked what happens if two different convolutional filters converge to the same thing. Jeremy’s answer was that this won’t happen because it is not optimal to do so, and training is constantly optimising, so it wouldn’t converge on such a solution.
Later, dropout was introduced. My intuitive feel for how dropout works is that by randomising whether or not any given neural weight is connected at any given time, it prevents any small part of the network from becoming highly significant relative to other parts. It, in some way, forces the network to spread its understanding across more of the network, and not become over-dependent on a small number of significant features.
Put another way, dropout is essentially fault injection, and the response of the network is to develop some fault tolerance.
My question then, is whether in the presence of dropout, a network may actually generate duplicate filters after all, specifically as a defence against dropout?
I.e. if a filter seems to be particularly important, then by generating multiple duplicates of this filter, it becomes less and less likely that dropout can cause the effects of that filter to be suppressed.
Does this make sense?