This article is fantastic, going over the concepts Jeremy brought up and giving an explanation for what l2 does when applied with batchnorm (it keeps the effective learning rate or gradient step from being impacted by high weights). Includes a lot of detailed experiments too. Definitely check it out.
Please add it to the first post of the lesson 11 thread: https://forums.fast.ai/t/lesson-11-discussion-and-wiki/43406 - Add a new section - Resources if it’s not there already. Thanks.