The Universal Approximation Theorem

Hi David @quantum. I like that you are curious about these questions. I’ll offer my take on them, but know it’s my own opinions mixed with facts. And I am not a great mathematician.

I think it’s best not to get too hung up on this Universal Approximation Theorem. It is an existence proof only, and does not offer any great insights about how to set or train weights in practice. Mathematical existence theorems are usually proved using other existence theorems which, if you trace them all the way down, reduce to a klunky but provable construction of the solution. In the case of the UAT, I have seen constructions made from step functions and sigmoids. No one ever would use such constructions in practice because they are too inefficient computationally in terms of units and processing. And no one would be interested in the equations for that construction because (besides being highly verbose) they would offer no insight into the function being approximated.

At best, the UAT tells up we are not trying to do an impossible task when training a machine learning network. There’s some comfort in knowing that - at least we are not wasting our efforts. But the UAT does not help with the actual practice of machine learning: which architecture to choose, how to train to a usable accuracy in a reasonable amount of time.

As for reversing a trained network, a model made up of Linear’s and ReLU’s approximates a curvy function as a series of straight line segments. (Bounded hyperplanes in higher dimensions.) It can’t make curves. But if you use enough units it gets as close to a curve as you want. The model happens to be simple enough that you could write down the equations and understand them. However, those equations are never going to tell you “this is a parabola”.

Likewise, every complex model can be written a giant equation, but I think what you are asking for is insight into the meaning of that equation. People currently use various method to make sense of the equation: activation maps, ablation studies, PCA, for example. I personally suspect and hope that someday machine learning will be able to extract and express higher level concepts that humans can relate to. But today they are merely very capable numerical approximators.

HTH a bit.

P.S. I recently skimmed a paper called “AI Feynman”. Maybe what they are doing is closer to what you want to be able to do.
https://arxiv.org/abs/1905.11481