Roadmap for active learning in v2

(Kerem Turgutlu) #1

For the past few weeks I have been doing some reading on active learning. I thought it would be a good idea to open a thread about this topic to discuss potential implementations in fastai-v2. Mainly to see if this would be something worth pursuing and investing time in.

So far I haven’t done any actual experimentation to test out some of the popular techniques but I intend to do so in the coming weeks. I also think that there might be many people in the forums that have a decent amount of hands-on and practical experience. So please share your thoughts and give feedback :slight_smile:

Mainly the goal would be to have utilities/helpers/pipelines that would enable to do the most with the least number of labelled data. Here are some example cases that might benefit from such goal:

  • Medical applications where labelling costs are too high
  • Sensory applications where signal to noise ratios are very low
  • Self driving cars that collect real time data - plethora of data to label, which ones to label?
  • Production models that interact with users and get input constantly - domain adaptation?
  • Many more…

So, this topic shouldn’t necessarily be narrowed down to only active learning but the term can be an umbrella for many ideas such as pseudo labelling (works pretty well in many kaggle comps), uncertainty estimation, semi-supervised learning, and so on.


(s.s.o) #2

A recent paper with title “Active Learning for Deep Detection Neural Networks” also includes codes. May be same words with different uses. :smiley: link to paper


(Kerem Turgutlu) #3

I started conducting a very preliminary work similar to this useful paper. In most active learning frameworks we try to come up with an uncertainty measure - there are many different algorithms you can use for this here is a good survey paper.

Once uncertainties are measured then user can decide whether to annotate, psuedo-label or ignore unlabeled samples. But there is one important caveat:

  • Is uncertainty measure and prediction performance correlated? (In a perfect scenario we would see a correlation of -1)

For testing this hypothesis I trained a binary segmentation model for few epochs and used entropy based uncertainty.

Train 15 epochs (1.5k) - Plotting samples out of training (5k)

  • a) High uncertainty, choose to annotate and will be most useful for model performance,
  • b) High uncertainty, choose to annotate and will be not be as useful as (a) since samples already have high IOU.
  • c) Medium level uncertainty, not so certain how to use these samples, most likely ignore.
  • d) Low uncertainty, choose to psuedo label but it may degrade model performance by feeding systematic error.
  • e) Low uncertainty, choose to psuedo label and will be most likely helpful in improving model performance.
  • f) Low uncertainty and low IOU. May these be samples out of training distribution? See this blog
    Maybe diversity sampling can be made to collect similar images.

This is a very early stage work to get an initial understanding. Entropy as an uncertainty measure is probably not the best one, especially compared with Dropout / Ensemble / Bayesian Network based uncertainties.

Next: Investigate Dropout and Ensemble uncertainty methods.


(hari rajeev) #4

Found this article interesting

Could you please put here links to interesting articles on active learning


(s.s.o) #5

One should listen some videos of Naftali Tishby from youtube and his papers. Hard to understand (at some point I have no clue what he is talking about :grinning:) , but really he is one of few people who discusses theory of DL from information theory point of view which forms the bases for active learning. video, some codes, and blog 1, blog 2


(urmas pitsi) #6

Very interesting approach from information theoretical perspective. Probably similar/same talk as posted above. But this one in Stanford.

1 Like

(s.s.o) #7

Yes almost same and there are couple of more in the same direction. The one I send was less technical, explaining more conceptual and more complete story for an easy to start. :grinning:


(Farid Hassainia) #8

Here is some information about an interesting library that might be useful in building something similar in fastai v2:

Bayesian Active Learning (Baal) by Element AI

Element AI (an artificial intelligence company co-founded by Yoshua Bengio and based in Montreal, Canada) has recently open sourced BaaL, an active learning library written in pytorch. BaaL supports the following methods to perform active learning.

  • Monte-Carlo Dropout (Gal et al. 2015)
  • MCDropConnect (Mobiny et al. 2019)

They plan to support these following methods:

  • Bayesian layers (Shridhar et al. 2019)
  • Unsupervised methods
  • NNGP (Panov et al. 2019)
  • SWAG (Zellers et al. 2018)

Documentation :

There is a nice tutorial here: Use BaaL in production (Classification)