Depth wise separable convolutions

How come we don’t loose much accuracy in case of Depth wise separable convolution even when the number of learnable parameters reduces significantly? Im specifically talking about the MobileNet paper where there is a drastic reduction in parameter numbers. What is the intuition behind dept wise separation capturing important patters from the image?