I came across the
No Free Lunch theorem while reading Goodfellow’s Deep Learning Book. In a very broad sense, it states that when averaged over all possible problems, no algorithm will perform better than all others.
What are its implications in deep learning? A general trend in DL has been that deeper networks tend to generalise better (vis-a-vis performance on the ImageNet over the years).
Does that mean there is a network deep enough which can generalise over all possible problems and hence the NFL theorem is just a fad?
It is not a fad at all!! And it is not true that deeper networks tend to generalise better. We have been achieving better performance over the years because we learnt how to constraint those highly parameterized networks.
About NFL, I believe that your statement “when averaged over all possible problems, no algorithm will perform better than all others” does not help to get to the point formalized by Wolpert’s theorem. We could say “no
learner are better than random guessing over all possible functions to be learned”. And, although it could seem as bad at first glance, we are lucky since the “world” functions we try to learn are not drawn uniformly among all mathematically possible functions. Machine learning and DL have been so successful because even though they learn a quite restricted set of limited-complexity function such functions are very often good at doing well in real-life tasks.
we are lucky since the “world” functions we try to learn are not drawn uniformly among all mathematically possible functions
And therefore NFL theorem supports itself in the real world ??
Sorry, but I cannot see your point. Would you please rephrase your last question. What do you mean by " supports itself" ??
I mean that (as you stated) we are lucky that usual functions that we try to learn are not drawn uniformly among all possible mathematical functions. Therefore, this “luck” makes NFL a self-satisfying proposition.
Now I see. By saying that “NFL a self-satisfying proposition” seems like NFL is just another brick in the math galaxy eventually. But in doing so, I think, you are diminishing its real contribute: it is because of the subtle insights NFL gives us that we can now understand why we are lucky. Again, this is just my personal point of view.
To add my two cents : NFL is a math limit. Even though some limits are really useful (like for example the central limit theorem), some are not really applicable to the real world. I think NFL is the latter. It makes no sense to average over all possible problems when thinking about specific problems like solving autonomous driving for example.
See here for Jeremy Howard’s very detailed opinion on that subject
very detailed indeed.
But I think NFL formally justifies the creation of task-oriented algorithms, rather than wildly aiming for everything in the world.
Again, here the definition of the task may keep expanding over time (like autonomous driving as absorbed many CV problems as its sub-problems), but it’ll always be bounded. And hence, we will fail beyond that definition of task.
This might sound trivial, but as @fabris pointed, having a mathematical proof for something trivial is better than having none.