Why are weight decay and learning rate tied together

I recently did a deep dive into the decouple_wd parameter that is shown in many of the fastai optimizers. One question I had when doing this deep dive is that weight_decay uses learning rate in the calculation. I’m curious if there is intuition around this lr*wd pair or why they are paired together in this case. It seems like you already have the hyperparameter wd to turn up or turn down the weight decay.