Duplicate filters as 'defence' against dropout?

sjbaines · September 10, 2017, 8:24pm

In the lesson 3 video, someone asked what happens if two different convolutional filters converge to the same thing. Jeremy’s answer was that this won’t happen because it is not optimal to do so, and training is constantly optimising, so it wouldn’t converge on such a solution.

Later, dropout was introduced. My intuitive feel for how dropout works is that by randomising whether or not any given neural weight is connected at any given time, it prevents any small part of the network from becoming highly significant relative to other parts. It, in some way, forces the network to spread its understanding across more of the network, and not become over-dependent on a small number of significant features.

Put another way, dropout is essentially fault injection, and the response of the network is to develop some fault tolerance.

My question then, is whether in the presence of dropout, a network may actually generate duplicate filters after all, specifically as a defence against dropout?

I.e. if a filter seems to be particularly important, then by generating multiple duplicates of this filter, it becomes less and less likely that dropout can cause the effects of that filter to be suppressed.

Does this make sense?

msp · September 11, 2017, 9:42am

I think that’s the right idea, but the design goal of dropout is mainly to avoid too strong a “co-adaptation” in the network – that is, it avoids the situation in which most neurons are only useful in tandem with many other neurons (cf the original dropout paper).

There will be a redundancy in the filters for sure. Whether dropout leads to one-to-one duplicated filters I have no idea. Might be an interesting investigation!