What is the another paper that talking about scean behind batch normalization

Jeremy in lecture 6 talked about batch normalization and he said there are two papers show that batch norm has nothing to do with internal covariant shift. In the class, he showed one paper which is " How does Batch Normalization Help Optimization".

I am wondering what the another paper is?


Are you talking about this paper?


Jeremy showed this one in the lecture, there should be another one I think

