In the beginning of the course, it was mentioned that neural networks don’t have local minima, therefore it was implied that being stuck in a local minima was not a possibility. When it came to justifying the need for restarts in the learning rate schedule, the case that the algorithm may be stuck in a sharp crest in the loss function is used.
The above two sentences contradict each other. If someone could provide some clarification, that would be great.
Before answering your question, could you please tell me which video did you mean as the beginning of the course, and it would be helpful if you also point out the time in the video so we can directly check what you are referring to.
He goes on to say “there are different parts of the space that are basically all equally good”. There are generally not traps of difficult to avoid and escape poor ‘local minima’. Having so many tools in the bag to traverse the landscape of parameter space is one reason we don’t need to worry unduly about perilous local minima. Of course, there are innumerable points in the parameter space surrounded by areas of higher loss. Call these local minima if you wish, but it’s not particularly useful.