I was reading Karpathy’s blog post where he explains that some networks can have as much as 40% dead neurons (zero all the time).
And so I had few questions regarding this:
- I’m wondering if anything was done to try to resolve this issue of dead neurons? Is this a real issue with modern architectures?
- I was thinking maybe something like a Keras callback function which re-initializes dead neurons could be interesting? Similarly, I’m wondering if there are ways to train two networks with the same architecture in parallel, and then merge their weights each batch/epoch so that the best activating neurons/filters are taken, and so hopefully converge faster?