How to deal with accuracy issues of floating point arithmetic

rshamsy · June 25, 2019, 6:17pm

I will be referring to the example used in the lecture of machine epsilon propagating in the repetitive calculations done in the for-loop.

With training loops for deep learning, I am guessing that several parameters are going to be updated in similar ways (using a for-loop). How can this issue of error propagating be solved, specifically in that example shown in the lecture (picture reproduced below), and more broadly?

Pomo · June 25, 2019, 6:35pm

That function is typically given as a specific example of mathematical chaos and its sensitive dependence on initial conditions.

Why do you think it could apply to a gradient descent training loop?

rshamsy · June 26, 2019, 1:44am

In the update rule for parameters and weights, you’re updating the same variable multiple times right? That’s what I thought was quite similar

Pomo · June 26, 2019, 7:06pm

Now I see how you can think of training a model as repeatedly applying the same function. I had never thought of it that way before.

The example you cite diverges when the numbers stray off one given, unstable starting orbit. It is a pathological example specifically designed to illustrate a theoretical problem. I don’t think the principle applies to training a model.

First, using simple SGD with small enough learning rate, loss is decreasing. Therefore there are no fixed orbits in the weight space for numerical errors to accumulate in. Second, we WANT weights to fall out of quasi-stable states and converge to a loss basin. That’s what training is supposed to do. The only place I can see for this example to apply is that nearby weight initializations may end up in different basins. But we already knew that, and this fact may turn out to be a feature, not a bug.

Caveat: argument is informal and my math is rusty.