I have some image classification projects in mind that would benefit from weak supervision, and I’m sure many people here would benefit from the same. It promises much reduced labelling effort, which I guess must still be one of the bigger blockers to implementing deep learning to solve real world problems in many cases.
What I’m shooting for is taking some unlabelled images, then using weak labelling to “tell” a deep net roughly where the objects of interest are (along with their labels), so that I can use that spatial information to train a classifier. The papers tend to concentrate on the first half of that problem. I guess my overall weak supervision + classification problem is a weak object detection problem bootstrapping a classification problem - but I’m not quite sure how best to combine those: just using a detection/segmentation net trained by weak supervision to generate input for a classification net? Or combining the two in one network, perhaps? Hints appreciated.
To give the general idea re weak supervision, here’s the abstract from “We don’t need no bounding boxes”:
Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector, re-localizing objects in the training images, and human verification. We use the verification signal both to improve re-training and to reduce the search space for re-localisation, which makes these steps different to what is normally done in a weakly supervised setting. Extensive experiments on PASCAL VOC 2007 show that (1) using human verification to update detectors and reduce the search space leads to the rapid production of high-quality bounding-box annotations; (2) our scheme delivers detectors performing almost as good as those trained in a fully supervised setting, without ever drawing any bounding-box; (3) as the verification task is very quick, our scheme substantially reduces total annotation time by a factor 6x-9x.
Here are some examples:
We don’t need no bounding boxes
Training object class detectors with click supervision
GitHub - abearman/whats-the-point1: Weakly-supervised semantic image segmentation with CNNs using point supervision / [1506.02106] What's the Point: Semantic Segmentation with Point Supervision
Unfortunately few of these schemes have open source implementations, and I probably need to play around a lot more with easier tasks before I dive into implementing something like “We don’t need no bounding boxes” from the paper…
So, have any of us here used a weak supervision scheme in a project? If so, how does it work (maybe a link to a paper) and how did you implement it (maybe even a link to a github project)? Did it do what you hoped?