Not sure if we are still doing this thread, but I wrote a blog post that I write about here.
It is on how far I got in tackling whether smoother surrounding error surface leads to better generalization.
If you would have any thoughts or comments I would love to hear them! 