I shared Jeremy’s Enlitic story with my roommate as well as our Dogs/Cats example and he started brainstorming all these ideas and asking me if I could build him a model. For most of them I wasn’t sure, because I’m not sure how extensible VGG16 is.
Can anyone provide examples of using VGG (or similar models) with fine tuning outside of the examples we’ve discussed? Ideally these examples would highlight the extremes of what’s possible.
This paper has some examples (using an earlier imagenet model than VGG): https://arxiv.org/abs/1403.6382 . What are some of the ideas your roommate asked about?
I don’t think anyone has collected a resource on the effectiveness of transfer learning from imagenet models. Perhaps that’s something we can try to build as a community?
The papers use different techniques:
Paper1 - Transfer Forests
Paper2 - AlexNet Convolutional Net pretrained on ImageNet
It looks like fine-tuning does work
Given that our ACS dataset contains 89,484 images - a decent sized dataset - we hypothesized that starting our fine-tuning at earlier CaffeNet layers would optimize performance.
My roommate initially wanted to apply this to video. I’m curious can VGG be used for video footage (assuming we chop up parts of the video into images first)?
This video from the Stanford CS231N course has some great detail on “spatio-temporal convnets” for classifying in videos by Andrej Karpathy, especially around the 9 minute mark:
Hi @brendan
I’m working on some apparel classification model myself, using vgg.
I’m still trying many different things, but it seems I’m getting pretty good results so far:
50k training images, 10k for validation, split into 68 classes, I got to about 50% accuracy after 15 epochs.
Will definitely give the free apparel dataset you shared here a shot, since my dataset is not really curated and some bogus images can be found.