Why is instance norm useful

Hi, can someone explain why instance norm is useful?
Batch norm makes sense to me because you compute the statistics of the activations across the entire dataset, and you want your activations to be around the same value so it’s easier to learn for the neurons.
But, then why would instance norm ever make sense if you only find the statistics within in a single batch instead of across all batches. Couldn’t there still be high variance across the batches?

I can say a little bit about this, at least as it relates image to image translation…

I think there are only certain cases where instance norm is more useful. During part two of the course last year I replicated the “Real-Time Style Transfer” from this paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution, but like Jeremy does I did some stuff different to see the result.

One thing I did differently was use batch norm instead of instance norm. I kept feeling that the results were blurry and dull, so I changed things to match the paper one by one, when I replaced batch norm with instance norm the result instantly became sharper and more contrast full.

My guess is that for things like style transfer, or cyclegans, it’s slightly less important that your results are statistically accurate and more important that the results have some nice detail. By using instance norm you know you get the most out of that data at every layer in the network, in exchange for that statistical accuracy.

This is a bit of a guess based on my experiments, but it kinda makes sense, someone else can shoot me down if you want lol…


Also, in the cases where batch size is v. low (e.g. medical image computing) instance norm seems to be used instead of batch norm (although I’m currently experimenting with batch re-norm to address the same issues).

1 Like