L2 Regularization Preferred over L1. WHY?

What makes L2 Regularization special and preferred in industry as compared to L1?
What is that special thing in L2 that is not solved by L1?

Have a look here: https://twitter.com/gabrielpeyre/status/1184333144725606401

L1 induces sparsity in the weights whereas L2 induces smaller weights.

One reason is that L2 regularization reduces mathematically to weight decay (for SGD optimizers). Weight decay is easy to understand and to implement. So you might call the preference a historical-mathematical-computational accident, in addition to any other reasons.