Ian Goodfellow says at 18:20 in https://www.youtube.com/watch?v=CIfsB_EYsVI : “The input of the model to the output of the model is close to being linear or piecewise linear with relatively few pieces. The mapping of the parameters of the model to the output is highly non linear. So the parameters have highly non linear interactions and thats what makes training much harder. Thats why optimising parameters is much harder than optimising inputs.”
I don’t understand how the inputs are linear and parameters non linear. I can kind of grasp inputs being linear as activations are monotonic functions but not parameters. Can someone pls explain this. He goes on to show an image to illustrate this as well.