Sparse Adam with weight decay

magus96 · September 9, 2019, 8:38pm

I was going through the implementation of the Sparse Adam optimizer in pytorch and found there is no weight decay implemented for it. Is it possible that because most weights are equal to zero in a sparse neural network, there really is no point in penalising large weights?