Why does the conv2d in vision nets have no bias term?

Pomo · October 31, 2019, 9:03pm

I am looking at the more recent architectures structured conv2d - activation - batchnorm.

It seems to me that a bias learned by the conv2d would have a major effect on the model’s learning capacity.

Thanks for any insights.

gessha · December 3, 2019, 7:59pm

Based on this discussion, bias terms in batch normalized networks are unnecessary since batch norm’s beta layer has similar function to the bias term.

Pomo · December 3, 2019, 8:23pm

Yes, that’s exactly the issue.

Traditionally, conv2d (no bias) -> batchnorm -> activation. Bias is not needed in conv2d because batchnorm handles it.

Recently, conv2d -> activation -> batchnorm. It seems in this order that learnable bias in the conv2d would have a major effect, for example on what activations ReLU zeros.