Student project: not all data points are created equal

In the most basic version of active learning we choose which data points to label next based on the uncertainty that the model has on them. I want to apply this idea in a normal supervised setting to came up with a smarter training strategy that does not show to the model examples that do not provide any additional information.

In this project I’d like to:

  1. Study how the uncertainty evolves over time for the different data points. My guess is that easy data points get very quickly to small uncertainty and stay there, while harder examples evolve more slowly from high to low uncertainty. Deliverable here is a cool visualization of this dynamic. This requires understanding well the datablock API and the callback system in fastai.

  2. Based on the findings of 1, come up with a strategy of making up mini batches that include difficult data points as well as easier ones (to avoid forgetting). The idea is that at epoch 1 the model will see all the data. At epoch 2, it will see only a subset: all the hard data points as well as a subset of the easy ones (just enough to avoid forgetting). At epoch 3, even less examples, and so on. The epochs will become shorter and shorter and hopefully we will get to the same results as a normal training in less time. This requires understanding well the training cycle in fastai and the creation of mini-batches.

  3. Bonus: see how different architectures and hyper parameters behave in terms of point 1. Maybe there is correlation between how an architecture performs in term of uncertainty after a few epochs and the final performance.

Any feedback on the idea is appreciated, and if somebody else finds this interesting I’d be happy to collaborate.


I have been working on something similar where I try to use the difference in memorization to remove noise instead of training faster by dropping instances
You may find this paper worth looking into

1 Like

I think a lot of people work in this area. I think it is very interesting.
At p2 we can do more aggressive augmentation to “easy” examples instead of skip it.

1 Like

I think the kind of thing you’re talking about is generally referred to as “hard mining” in the literature. If you search for that term you might find some existing research that could help you. For example I found this paper that uses it to help with class imbalance, it also cites examples in other domains

1 Like

That’s a great idea!

Hi @giacomov,

Very interesting project idea. It turns out that measuring “uncertainty” maybe useful to other areas of research in DL, such as the interpretability of DL models, and making DL models more “robust” (i.e. resilient to adversarial attacks).

I saw this YT video from the last lecture of an MIT DL course about the new frontiers in DL and specifically, discussing the limitations of DL.

One of the current research areas is about “uncertainty” - see this part of the video – is on using dropout to create a bayesian network to generate a metric for uncertainty.

On further research, I found that someone has already implemented a bayesian network using dropout on fastai and has written an Medium article on this as well as shared his github notebooks.

As a useful project starting point, maybe we can adapt his code to fastai2?

If anything, it forces me to direct my efforts to learning DL concepts as well as fastai code to a level where I could understand what the code is doing in order to be able to modify it for fastai2.

Best regards,


Yeah, that is exactly what I had in mind. There are several ways of dealing with uncertainty, and I planned to use that code for fastai v1. I actually assumed it was going to be there for v2, but I didn’t check that this is the case.


1 Like

I will organize a Zoom meeting for later this week with anybody interested in this project.

Anybody interested that would like to be in that call, please send me your email either here or at

1 Like

I’ve just stumbled upon this thread. Would love to help!