Jeremy introduced the bias term in the Embedding model, but is the bias term used in Rossman model and CV models like resnet? Pytorch implementation of Resnet, Resnext, etc. doesn’t seem to have a bias term, but I just want to double check.

My understanding is that biases are used typically.

I suppose in batch normalisation they may not be required because then mean is separated out.

Did you work this out at all?

Very prevalent. If you use batchnorm after a layer without a bias term, it is as if the earlier layer was using the bias term.

Batchnorm learns two parameters - how much to shift the activations (which is what the bias term in the earlier layer would do) and by what value to scale them