@radek, very interesting and explicative.
About the experimental design, a couple of ideas crossed my mind (just discard them happily if you find them useless or if have already considered them):
I think your design can be valid, but maybe tricky to implement. First thing I would do: I would compare the NNetworks making sure that they have the same, or very close initial validation error.
Second thought, the more complex a Network is, possibly the more complex its error surface. Maybe one hidden layer is not enough complexity, even it the experiment main insight is simple you need complex surfaces!
Third thought,the scale of the weight distortion is essential. Even then, you dont have that many data points, your chart could well be a part of a bigger chart, possibly even with such small data you get positive or negative correlation number, even if not statistically significative.
So, in few words, very nice post and, if you find the time, I would not throw it to the trashbin yet, not without trying some more little ideas like the ones I gave. Maybe you are already there, you know, I wouldnt be in a hurry to draw conclusions yet.