Do different layers have different regularization amounts?

Intuitively(but with no empirical evidence) I would guess that most over fitting happens because of weights in later layers. While early layers are more for finding features.

Is regularization applied equally across all the layers?

Not the whole answer, but when looking at the source code for the default head when you build an image classifier I noticed that the dropout variable ps is set at ps for the last dropout layer, but at ps/2 for all of the earlier dropout layers of the head. So that seems to agree.

I guess I was asking more about weight decay…

Oh in that case I asked nearly the same question during lesson 5 [here]( It’s an active area of research but nothing of the sort yet implemented in fastai as I understand it