For the past few weeks I have been doing some reading on active learning. I thought it would be a good idea to open a thread about this topic to discuss potential implementations in fastai-v2. Mainly to see if this would be something worth pursuing and investing time in.
So far I haven’t done any actual experimentation to test out some of the popular techniques but I intend to do so in the coming weeks. I also think that there might be many people in the forums that have a decent amount of hands-on and practical experience. So please share your thoughts and give feedback
Mainly the goal would be to have utilities/helpers/pipelines that would enable to do the most with the least number of labelled data. Here are some example cases that might benefit from such goal:
- Medical applications where labelling costs are too high
- Sensory applications where signal to noise ratios are very low
- Self driving cars that collect real time data - plethora of data to label, which ones to label?
- Production models that interact with users and get input constantly - domain adaptation?
- Many more…
So, this topic shouldn’t necessarily be narrowed down to only active learning but the term can be an umbrella for many ideas such as pseudo labelling (works pretty well in many kaggle comps), uncertainty estimation, semi-supervised learning, and so on.