Do different layers have different regularization amounts?

source99 · November 25, 2018, 2:17pm

Intuitively(but with no empirical evidence) I would guess that most over fitting happens because of weights in later layers. While early layers are more for finding features.

Is regularization applied equally across all the layers?

PierreO · November 25, 2018, 2:24pm

Not the whole answer, but when looking at the source code for the default head when you build an image classifier I noticed that the dropout variable ps is set at ps for the last dropout layer, but at ps/2 for all of the earlier dropout layers of the head. So that seems to agree.

source99 · November 25, 2018, 4:46pm

I guess I was asking more about weight decay…

PierreO · November 25, 2018, 4:50pm

Oh in that case I asked nearly the same question during lesson 5 [here](https://forums.fast.ai/t/lesson-5-in-class-discussion/30864/247?u=pierreo. It’s an active area of research but nothing of the sort yet implemented in fastai as I understand it