I understand that Batch Normalization prevents weights from getting too skewed and keeps your model from diverging. How does this compare to imposing constraints on the weights: (https://keras.io/constraints/) ?
-
Any intuition on when batch norm works better than constraints and vice versa?
-
Is it my correct understanding that it would be somewhat redundant to use both constraints and batch norm?