How to deal with accuracy issues of floating point arithmetic

(Rahim Shamsy) #1

I will be referring to the example used in the lecture of machine epsilon propagating in the repetitive calculations done in the for-loop.

With training loops for deep learning, I am guessing that several parameters are going to be updated in similar ways (using a for-loop). How can this issue of error propagating be solved, specifically in that example shown in the lecture (picture reproduced below), and more broadly?


(Malcolm McLean) #2

That function is typically given as a specific example of mathematical chaos and its sensitive dependence on initial conditions.

Why do you think it could apply to a gradient descent training loop?


(Rahim Shamsy) #3

In the update rule for parameters and weights, you’re updating the same variable multiple times right? That’s what I thought was quite similar


(Malcolm McLean) #4

Now I see how you can think of training a model as repeatedly applying the same function. I had never thought of it that way before.

The example you cite diverges when the numbers stray off one given, unstable starting orbit. It is a pathological example specifically designed to illustrate a theoretical problem. I don’t think the principle applies to training a model.

First, using simple SGD with small enough learning rate, loss is decreasing. Therefore there are no fixed orbits in the weight space for numerical errors to accumulate in. Second, we WANT weights to fall out of quasi-stable states and converge to a loss basin. That’s what training is supposed to do. The only place I can see for this example to apply is that nearby weight initializations may end up in different basins. But we already knew that, and this fact may turn out to be a feature, not a bug.

Caveat: argument is informal and my math is rusty.