Clarification for local minima in neural networks

In the beginning of the course, it was mentioned that neural networks don’t have local minima, therefore it was implied that being stuck in a local minima was not a possibility. When it came to justifying the need for restarts in the learning rate schedule, the case that the algorithm may be stuck in a sharp crest in the loss function is used.

The above two sentences contradict each other. If someone could provide some clarification, that would be great.

Thank you.

Hi @rshamsy
Before answering your question, could you please tell me which video did you mean as the beginning of the course, and it would be helpful if you also point out the time in the video so we can directly check what you are referring to.

1 Like

Lesson 1, a little bit after the 51 min mark. I continued watching, and got some clarification:

  • there’s ‘basically’ one minimum
  • by ‘basically’ it is meant that there are not multiple local minima, but just a few that have the same value. So then, it makes sense how there can be a sharp trough and a broad trough.
  • I’m tempted to ask how the cost function values at these minima are similar, but I will investigate the math behind that.


He goes on to say “there are different parts of the space that are basically all equally good”. There are generally not traps of difficult to avoid and escape poor ‘local minima’. Having so many tools in the bag to traverse the landscape of parameter space is one reason we don’t need to worry unduly about perilous local minima. Of course, there are innumerable points in the parameter space surrounded by areas of higher loss. Call these local minima if you wish, but it’s not particularly useful.

1 Like