Couple dumb questions about stuff mentioned in first video:
Universal Approximation Theorem is mentioned that requires exponential size network, but then backpropagation helps with that. Is there a version of the theorem that says how much backprop helps? E.g. does it become polynomial?
Learning rate finder is reminiscent of old fashioned numerical root finders and the like, used in calculators and desktop programs. There’s a famous article by W. Kahan about the HP-34C solver from 1979: http://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1979-12.pdf (starts at page 20 of the pdf). Is this similar? Is traditional numerics much help in machine learning?
Similarly is it reasonable to find the minimum by numerical differentiation and then looking for derivative = 0 with a traditional root finder?
The demo showing the different layers of a DNN recognizing features showed a layer recognizing circles. But since the input is a 3x3 grid would that actually recognize circles of only a specific size? Do actual deep learning algorithms manage to to recognize shapes like circles regardless of their size? Does anyone train on Fourier transforms of the input images, or anything like that?
Sorry to be so low level early in the course, the opposite of the advice about going top-down. Those issues just jumped out at me.
The course looks great, thanks a million for doing it.