You know what makes me feel a bit uneasy when studying deep learning? It’s like being a caveman experimenting with fire prior to any knowledge, even rudimentary about the physics of combustion. And this happens even in the academia.
Take Smith’s paper for example. It’s very interesting, and it shows useful experimental result. Still, it is pure experimentation at blind, indeed.
No theoretically grounded results are shown. No further insights about the topology of the loss surface are obtained
In their conclusions, the authors write:
Furthermore, we believe that a theoretical analysis would provide an improved understanding of these methods, which might lead to improvements in the algorithms.
That’s the way to go, in my opinion. Yet, I searched for them and found little or nothing, although something exists (also by Smith) about the forementioned topological insights (in a rather slacky sense, though).
I highlighted the parts I considered noteworthy:
smith II.pdf (551.3 KB)