Why does the conv2d in vision nets have no bias term?

I am looking at the more recent architectures structured conv2d - activation - batchnorm.

It seems to me that a bias learned by the conv2d would have a major effect on the model’s learning capacity.

Thanks for any insights.

Based on this discussion, bias terms in batch normalized networks are unnecessary since batch norm’s beta layer has similar function to the bias term.

Yes, that’s exactly the issue.

Traditionally, conv2d (no bias) -> batchnorm -> activation. Bias is not needed in conv2d because batchnorm handles it.

Recently, conv2d -> activation -> batchnorm. It seems in this order that learnable bias in the conv2d would have a major effect, for example on what activations ReLU zeros.