Gauss meets Deep Learning

Hi Everybody

An interesting read

Regards Conwyn

1 Like

The other answers so far in this thread have pointed out universal approximation theorems holding, but this is not really an interesting result. Universal approximation also holds for polynomials (this is immediate from the Weierstrass approximation theorem, a generalization of which is often used to prove universal approximation for neural nets).

The question we want to know the answer to is why neural networks are able to learn very flexible models in extremely high dimensions, ideally in a way that connects with training by (say) stochastic gradient descent. There are lots of heuristics, with varying amounts of empirical/theoretical support, but to the best of my knowledge none of that meets the lofty bar you have set for explaining why they work so much better than (say) SVMs in the way that Gauss-Markov justifies OLS.