Jeremy in lecture 6 talked about batch normalization and he said there are two papers show that batch norm has nothing to do with internal covariant shift. In the class, he showed one paper which is " How does Batch Normalization Help Optimization".
I am wondering what the another paper is?
Cheers.