Local minima in NN and SGDR (Question about lesson 1&2)

Hello all!

In lesson 1, when he explains that neural network has property of being able to achieve ‘All-purpose parameter fitting’, he explains that fortunately NN doesn’t have local minima. Checkout the lesson 1 50:24 and onward.

In lesson 2, Jeremy explains the reason why SGD with restart is good is when cost function is low but not so generalized the restarts can move the parameters to the more generalized cost function area. But then the graph looks like it’s exactly the case when you are stuck in local minima. Check out the lesson 2 33:40 and onward.

I find these two contradicting and confusing, could someone please explain how to understand these teaching without contradiction?

  • P.S.
  1. I found a similar question but I don’t think I found the satisfiable answer yet
  2. Technically now this question is asked more than twice so should I put it in the FAQ ?