Reading various, like SPADE, DSFD(for face detection) to name a few, have a common theme. They outperform all the existing methods in their field and they use VGG16 or 19 for feature extraction. These papers are 2019 papers.
VGG is known for its feature extraction capabilities, but anyone has reasoning for this. Both VGG and ResNet use 3x3 conv’s, similar upsampling of number of filters. So the reason should be around skip-connections. I may be overthinking on this one. But it would be interesting to get views of someone who has used these models.