VGG, strength and limitations?

I didn’t find any discussion on VGG’s history, strength and limitations, so I thought I’d start one in the hope that folks with more experience can share more insights.

First of all, I am astonished by how influential VGG is, their original paper published in april 2015 has been cited 3177, that’s on average 3+ papers per day citing this paper in the last 2 years!! This metadata seem to be indicate that increased depth in network configurations is shaping how people think about deep learning and building deep learning models. If that is the case, can we say the deeper the better? If not, is there such a thing as optimal depth? Why stop at 19 (as in the original paper)? What are other big ideas in deep learning in addition to going “very deep”?

Second part of my question is about VGG’s application in practice, when does it work really well, when does it not work so well? Since most folks here probably tested it on their own datasets, I am curious if people care to share their own experience.

Finally, just to get some context in training time expectations. In their submission to 2014 imagenet, Simonyan and Zisserman explained “Our implementation is derived from the Caffe toolbox, but contains a number of significant modifications, including parallel training on multiple GPUs installed in a single system. Training a single ConvNet on 4 NVIDIA Titan GPUs took from 2 to 3 weeks (depending on the ConvNet configuration).” I for one am very grateful to Jeremy’s guidance on starting with sample data.

You can find the original paper here

Imagenet2014 [results] (http://image-net.org/challenges/LSVRC/2014/results)

2 Likes

Thank you, Anirudh! Your insight is packed with so much great information, it’s super helpful in accelerating my learning and getting the big picture of AI research. When you mentioned deeper networks can have higher test errors, can you expand on that? why is that the case and how are people fixing that problem currently?

Also, thank you so much for sharing your perspectives on VGG. I’ve only used it on my test data, and my models are still very manageable. Now I know the bulky model size for full training, I can plan ahead and optimize my workflow accordingly.

So glad you mentioned resnet. I just came across some amazing resnet results recently, now I am super curious to learn more about it. Have you personally used resnet before? what do you like and not like about it? Forgive me if this is too many questions, I am pretty psyched about the rapid development from 19 to 152 in such a short time!

Btw, if anyone’s interested, here are Kaiming He’s resnet tutorial at ICML and the companion lecture notes.

1 Like

Your explanation on choosing network layers for customized use case is fantastic!! This is the first time I learn about the practical concerns of deep learning use case in app production pipeline, amazing stuff! (or dare I say super cool!)

I only covered the first 35 minutes of the talk, it’s very clear and I really like the simple explanation of exploding/vanishing gradient.