I like your question! If it can’t even do this, why trust it to do that?
In trying to understand RNNs better, I asked the same kinds of questions. In the end, it successfully fit combinations of x, e^x, sine, and randoms. But only after many struggles. For a while, the LSTM could not even learn to pass the same number out as in (identity)!
A math fact (I think!) is that polynomials can’t easily be approximated by compositions of linear and ReLU. So I am not surprised that the model has difficulty. Such problems are better suited to classic curve-filtting methods. That said, it is interesting to investigate and understand what deep learning models can and can’t do.
A few ideas:
-
Use a smooth activation function between layers. It’s hard to see how linear + ReLU could easily emulate a polynomial, even though I understand it’s theoretically possible. It may need more layers too, so that compositions of linear and activation can approximate x^2.
-
Try a simpler model. I found that if the model is too smart, it memorizes the training set and predicts badly. A model with less capacity is forced to generalize.
-
Are you predicting within the training domain or past the domain? This can distinguish between memorizing training points vs. failing to generalize a function.
-
What do you mean by “normalizations”? Make sure that the normalizations are not throwing away information that the layers need to approximate x^2.
-
You could fit ln() of input to ln() of target to solve x^2 in one layer. This is an example of using a “classic method”.
Good luck with your experiments. I would be very interested to know what you discover!