Who's using weak supervision? Weak detection -> classification?

I have some image classification projects in mind that would benefit from weak supervision, and I’m sure many people here would benefit from the same. It promises much reduced labelling effort, which I guess must still be one of the bigger blockers to implementing deep learning to solve real world problems in many cases.

What I’m shooting for is taking some unlabelled images, then using weak labelling to “tell” a deep net roughly where the objects of interest are (along with their labels), so that I can use that spatial information to train a classifier. The papers tend to concentrate on the first half of that problem. I guess my overall weak supervision + classification problem is a weak object detection problem bootstrapping a classification problem - but I’m not quite sure how best to combine those: just using a detection/segmentation net trained by weak supervision to generate input for a classification net? Or combining the two in one network, perhaps? Hints appreciated.

To give the general idea re weak supervision, here’s the abstract from “We don’t need no bounding boxes”:

Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector, re-localizing objects in the training images, and human verification. We use the verification signal both to improve re-training and to reduce the search space for re-localisation, which makes these steps different to what is normally done in a weakly supervised setting. Extensive experiments on PASCAL VOC 2007 show that (1) using human verification to update detectors and reduce the search space leads to the rapid production of high-quality bounding-box annotations; (2) our scheme delivers detectors performing almost as good as those trained in a fully supervised setting, without ever drawing any bounding-box; (3) as the verification task is very quick, our scheme substantially reduces total annotation time by a factor 6x-9x.

Here are some examples:

We don’t need no bounding boxes

Training object class detectors with click supervision

https://github.com/abearman/whats-the-point1 / http://arxiv.org/abs/1506.02106

Unfortunately few of these schemes have open source implementations, and I probably need to play around a lot more with easier tasks before I dive into implementing something like “We don’t need no bounding boxes” from the paper…

So, have any of us here used a weak supervision scheme in a project? If so, how does it work (maybe a link to a paper) and how did you implement it (maybe even a link to a github project)? Did it do what you hoped?


This is very promising, did you make any progress?

Sorry for the slow response…

No not yet, still trying to get my hands on the data in useable form (just routine technical issues) :frowning:

Somebody actually replied with an interesting answer, I guess they removed the post later - thanks to that person anyway. I guess they won’t mind if I repeat that they said they were using the mmog detector from dlib, then hand-adjusting bounding boxes and retraining the detector with the new human-tweaked bounding boxes. They repeated that until most objects were localized, then localized the remaining objects by hand. I guess they then just used the bounding boxes in their loss function for training in a similar/identical way to “standard” ILSVRC models. I’m still not very familiar with those models, from memory I think e.g. VGG16 does use bounding boxes in its training data, and predicts both bounding boxes and classes?

Hey there guys and gals,

I am as well extremely interested in getting my hands on the code for our project and actual implementation of it in the industry. Has anyone of you tried to contact Dim P. Papadopoulos himself? Or the co-authors for that matter?

Again, apologies for not being around to respond, artemiy.

I kind of assume that in this fast-moving field especially, if the code wasn’t published fairly soon it might not be forthcoming, simply because the authors have moved on to newer and better things. Having said that I notice that that group (first author Papadopoulos again) have a new paper coming, preprint here (it’s the one at the top of that list right now, “Extreme clicking for efficient object annotation”):


This one isn’t about weak supervision, but it does still relate to reducing labelling cost.

Ferrari’s group does publish some implementations (mostly Matlab):


So I just emailed Papadopoulos asking about implementation code (and pointing at this thread).

1 Like

I just learned some related terms:

1 Like

Dim Papadopoulos kindly replied to tell me that he is planning to publish some parts of the code – in particular the framework he is using for multiple instance learning (which he said was hard to get working well), and the AWS Mechanical Turk code – but not before late September.

He didn’t say he was planning on publishing the full code used in these papers however.

1 Like

So happy I found this thread! I’m working on something similar and asking myself the same questions! Right now I’m exploring Active Learning as a potential solution, but there are many others we can incorporate (transfer learning, progressive learning, online learning, self-learning, few shot learning, unsupervised learning, etc.).

Here are some interesting papers I found:

Yesterday I classified 20,000 images from our favorite dataset :wink: in 10 minutes with 93% accuracy, after hand labeling only 200 images! I realize this is a super easy problem, but it’s a start. On the backend I wasn’t doing anything clever, just using a Resnet34 pretrained on image net. This works for classification tasks similar to imagenet, but breaks down when datasets deviate (medical, satellite). This is where active learning and the other techniques come into play.

Right now it handles single and multi-label classification, but the technique can be extended to detection, segmentation, video, even NLP


Nice. Thanks for those links, too.

I didn’t know the term Progressive Learning and I I’d forgotten what Active Learning refers to… more to add to the mental armoury.

Same here, I contacted him directly as well. He was very kind and nice throughout the communication, but we have to wait one more month to get our hands dirty with his code. Let’s see what comes.

Hi Brendan, Nice job!!!
I also have the challenge of manually annotating the images. Do you know how to extend this to bounding box and object detection. More so, can I get a link to your github for your implementation of active learning. I just heard about active learning from you a few hours ago.

Thank you.

I suppose by now someone must have looked at Snorkel: