SGD with restarts

Hi from Iran
in SGD with restarts , why should we learn the most generalized ,instead of finding the lowest loss
is that just suitable for kind of test and training set that have huge differences with each other?
i mean that moving a little bit of figure witch Jeremy mentioned

See my answer here: SGD with restarts

1 Like