I am going over the imagenet processing notebook, and one the fascinating things for me there is being able to do zero-shot learning - being able to recognize classes which are not there in the training set by leveraging word embeddings. This is really fascinating because this can lead to state of the art computer vision applications where we have very small number of labeled images, especially useful for on the fly learning of new classes.
From quick researching, it seems like there were many papers around zero-shot and one/multi shot learning recently. @jeremy What do you think is the state of the art in this area right now?
I couldn’t quite figure that out when I was researching that lesson - I felt like that hadn’t been a lot of great progress in this area recently, which is why I taught a relatively old paper (DeVISE). The idea of generalized zero shot learning is probably the more important question: http://arxiv.org/abs/1605.04253
I see. Will read the paper you linked. Thank you!
Anything you recommend for one/multi shot encoding?
Generally, here are some of the directions I am curious about:
- Treating all classes as discrete classes means we are throwing away information related to how similar some classes are compared to others. For example leverage wordnet hierarchy.
- Text data is freely available and rich in semantic relations. Leveraging this in image recognition seems like a great direction.
- Maybe we can also learn these semantic relations using unsupervised learning on videos as well?
- For new classes, it might be possible to generate useful augmented data if we know the closest neighboring class?
Will read through the DeVISE paper and the above paper and will update here with any findings.
That’s why I taught the DeVISE paper - it’s taking advantage of exactly that (by training images to match word vectors)! As we discussed in class, I believe there’s many many applications of this paper that could be built…