How Extensible is VGG? Real-world examples

brendan · November 1, 2016, 4:04pm

I shared Jeremy’s Enlitic story with my roommate as well as our Dogs/Cats example and he started brainstorming all these ideas and asking me if I could build him a model. For most of them I wasn’t sure, because I’m not sure how extensible VGG16 is.

Can anyone provide examples of using VGG (or similar models) with fine tuning outside of the examples we’ve discussed? Ideally these examples would highlight the extremes of what’s possible.

jeremy · November 1, 2016, 5:07pm

This paper has some examples (using an earlier imagenet model than VGG): https://arxiv.org/abs/1403.6382 . What are some of the ideas your roommate asked about?

I don’t think anyone has collected a resource on the effectiveness of transfer learning from imagenet models. Perhaps that’s something we can try to build as a community?

brendan · November 2, 2016, 4:48am

Fashion item classification. Could we train a model to identify whether a particular brand’s clothing is in a image?

I think just answered by own question. I found these related papers:

Among other things, these paper try to identify the type of clothing (hat, glove, dress) seen in an image. They both train on a [free dataset] (http://people.ee.ethz.ch/~lbossard/projects/accv12/index.html) of 80,000 clothing images.

The papers use different techniques:
Paper1 - Transfer Forests
Paper2 - AlexNet Convolutional Net pretrained on ImageNet

It looks like fine-tuning does work

Given that our ACS dataset contains 89,484 images - a decent sized dataset - we hypothesized that starting our fine-tuning at earlier CaffeNet layers would optimize performance.

Accuracy achieved:
Paper1 - 41.3%
Paper2 - 50.2%

Ebay also took a stab at this:

Fashion Apparel Detection: The Role of Deep Convolutional Neural Network

jeremy · November 2, 2016, 2:48pm

@brendan maybe you could try to beat the 50.2% paper? I think VGG should give significantly better results than Alexnet…

brendan · November 2, 2016, 3:23pm

You betcha

My roommate initially wanted to apply this to video. I’m curious can VGG be used for video footage (assuming we chop up parts of the video into images first)?

jeremy · November 2, 2016, 5:26pm

Video should work fine.

jbrown81 · November 2, 2016, 6:06pm

This video from the Stanford CS231N course has some great detail on “spatio-temporal convnets” for classifying in videos by Andrej Karpathy, especially around the 9 minute mark:

and this site has some more detail:
http://cs.stanford.edu/people/karpathy/deepvideo/

jeremy · December 8, 2016, 11:29pm

@brendan a friend just sent me a link to some interesting papers in this area: https://sites.google.com/site/kevinlin311tw/

brendan · December 30, 2016, 4:47pm

Wow an impressive body of work. Thanks for sharing!

sorinpanduru · September 6, 2017, 9:59am

Hi @brendan
I’m working on some apparel classification model myself, using vgg.

I’m still trying many different things, but it seems I’m getting pretty good results so far:
50k training images, 10k for validation, split into 68 classes, I got to about 50% accuracy after 15 epochs.
Will definitely give the free apparel dataset you shared here a shot, since my dataset is not really curated and some bogus images can be found.