AdamW tuning of weight decay

lminer · November 6, 2018, 7:58pm

Has anyone come up with any good heuristics for choosing weight decay?

MicPie · November 6, 2018, 8:08pm

This was an approach posted with a notebook: Fastai_v1, adding features

Kind regards
Michael

lminer · November 6, 2018, 8:38pm

That paper was written before the AdamW paper so I’m wondering if it is still relevant.

If I’m interpreting it right though, the approach would be to choose a learning rate first without weight decay in the usual way, and then choose the maximum weight decay that doesn’t lead to degradation at that learning rate? Is that right?

gsg · November 6, 2018, 10:18pm

Yes,
Leslie Smith in
A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay