I’m currently reading through Thoughtful Machine Learning in Python by Matthew Kirk and he has the following paragraph:
A good example of a hidden feedback loop is predictive policing. Over the last few
years, many researchers have shown that machine learning algorithms can be applied
to determine where crimes will occur. Preliminary results have shown that these algo‐
rithms work exceptionally well. But unfortunately there is a dark side to them as well.
While these algorithms can show where crimes will happen, what will naturally occur
is the police will start patrolling those areas more and finding more crimes there, and
as a result will self-reinforce the algorithm. This could also be called confirmation
bias, or the bias of confirming our preconceived notion, and also has the downside of
enforcing systematic discrimination against certain demographics or neighborhoods.
While hidden feedback loops are hard to detect, they should be watched for with a
keen eye and taken out.
How could this be taken out in practice? In this example would you have to leave the patrols spread across an area even though you have more crime being done in a certain location? If that was the case, why have the prediction at all. I guess my confusion is how could you keep this unbias towards the area that had higher crime rates in the past without turning a blind eye to the crime that is currently going on and acting like every area has the same amount of crime.
Mostly just wanting to start a discussion here to see what some other viewpoints are about this and if there is any advice on how to remove these feedback loops without also making the model less effective, I’m interested in hearing that as well.
I’m no expert, but perhaps a way to deal with biases is to include a certain amount of randomization.
I recently read an interesting take on his in Communications of the ACM (on a slightly different topic but it touches on randomization): https://cacm.acm.org/magazines/2017/9/220429-divination-by-program-committee/fulltext
Maybe officials could divide the number of crimes or crime rate by the number of patrols sent to that area, or some other quantity that measures the amount of attention or effort given to that area.
Awesome topic. A few thoughts:
- This is an issue that exists independently of Machine Learning, though the rise of ML may make the problem more pertinent. As such engaging with social scientists who have experience in this area is probably a good idea. And more generally engineers and scientists come across problems with reinforcing feedback all the time, what do they do?
- Perhaps something as simple as Matthew’s suggestion could work, though it requires an understanding of how the feedback might work; a choice of model. The only certainty about a choice of model is that it is wrong (though perhaps good enough).
- The only reliable way of avoiding this problem is to choose a dataset which is robust to feedback. In the crime example, communities could be surveyed. So rather than rely only on reported crime, ask everyone whether they’ve been victim of crime.
- Any use of machine learning in the public realm should be transparently documented, so that those with different perspectives can point out potential flaws, biases or sources of feedback. Just as we do code reviews, we should do model reviews too.
Summary: be aware of the flaws of the model, but try to fix the data collection.
I feel that it is related to a problem that arises in contextual bandit models. These models are trained to select an action from a predefined set of actions, and they are trained on incomplete noisy data. For example, you train a model that selects a treatment option for a patient based on the previous medical history of the patient. Each data point in the dataset has input data, selected action and an outcome (“reward”), e.g. whether the patient is alive 5 years after the treatment was applied.
If you train a naive model, the model can learn to output actions which were typical in the training set.
To overcome this, information about data collection policy is included in the dataset. Specifically, it is important to know how probably was that action given the input parameters for that data collection policy. Then this information can be used by the IPS (inverse propensity scoring) estimator.
Also, to keep collecting information about different actions, some exploration policy is used. E.g. in 5% of cases to make a random action instead of following the model prediction.
I’m not sure how applicable it is to other kinds of machine learning models, like predicting crimes. But, perhaps, the general approach to keep information about data collection policy and use it to evaluate predictions is applicable here.