Here’s a choose-your-own-adventure game nobody wants to play: you’re a United States judge tasked with deciding bail for a black man, a first-time offender, accused of a non-violent crime. An algorithm just told you there’s a 100 percent chance he’ll re-offend. With no further context, what do you do?
Judges in the US employ algorithms to predict the likelihood an offender will commit further crimes, their flight risk, and a handful of other factors. These data points are then used to guide humans in sentencing, bail, or whether to grant (or deny) parole. Unfortunately the algorithms are biased 100 percent of the time.
A team of researchers led by postdoctoral scholar Andrew Selbst, an expert on the legal system and the social implications of technology, recently published research highlighting the unavoidable problem of bias in these algorithms. The research paper indicates there are several major “bias traps” that algorithm-based prediction systems fail to overcome.
On the surface, algorithms seem friendly enough. They help us decide what to watch, what to read, and some even make it easier to parallel park. But even an allegedly unbiased algorithm can’t overcome biased data or inconsistent implementation. Because of this, we have what’s called the “Ripple Effect Trap.”
According to the researchers:
When a technology is inserted into a social context, it has both intended and unintended consequences. Chief among the unintended consequences are the ways in which people and organizations in the system will respond to the intervention. To truly understand whether introduction of the technology improves fairness outcomes, it is not only necessary to understand the localized fairness concerns, as discussed above, but also how the technology interacts with a pre-existing social system .
In essence, this means that by giving judges the choice to use the algorithm at their own discretion, we’re influencing the system itself. Human intervention doesn’t make a logic system more logical. It just adds data that’s based on “hunches” and “experience” to a system that doesn’t understand those concepts. The ripple effect occurs when humans intervene in a system (by choosing when to use it based on personal preference) and then that same system intervenes in human affairs (by making predictions based on historical data). The result is an echo-chamber for bias.
Perhaps the biggest problem with algorithms is that they’re based on math – justice is not. This is called the “Formalism Trap.”
The researchers wrote:
Failure to account for the full meaning of social concepts such as fairness, which can be procedural, contextual, and contestable, and cannot be resolved through mathematical formalisms.
Algorithms are just math, they can’t self-correct for bias. In order to make them spit out a prediction you have to find a way to represent the related concepts in terms of numbers or labels. But social concepts like justice and fairness can’t be represented in math or labels because they’re in a constant state of flux, always being debated, and subject to public opinion.
Selbst told Technology Review:
You can’t have a system designed in Utah and then applied in Kentucky directly because different communities have different versions of fairness. Or you can’t have a system that you apply for ‘fair’ criminal justice results then applied to employment. How we think about fairness in those contexts is just totally different.
Algorithms are biased towards meeting developers’ expectations, not fairness. We can demonstrate this using the “Framing Trap.” When developers create an algorithm they test it by measuring the accuracy of its output as compared to its input. For example, they might feed an algorithm designed to help judges determine recidivism a data set containing 1,000 cases they have ground-truth results for – they already know whether the individuals in each case committed more crimes or not – just to see if the algorithm accurately predicts each outcome. If the algorithm fails, the researchers will adjust it and try again. They keep doing this until they can say that their algorithm meets their client’s threshold for success. This is how black box research works.
Accuracy is relative in determining the efficacy of these algorithms because there’s no way for the developers to know what’s happening in the black box. Neither these black box systems nor their developers can explain false negatives or false positives.
This brings us to the “Framing Trap.” Fairness can’t be implemented at the algorithm level – or in the algorithm frame – because machines don’t control the accuracy of their data.
According to the researchers:
Within the algorithmic frame, any notion of “fairness” cannot even be defined. This is because the goal of the algorithmic frame is to produce a model that best captures the relationship between representations and labels. Investigating the idea of fair machine learning requires us to expand the frame to encompass not just the algorithm but the algorithm’s inputs and outputs. We refer to this abstraction level as the data frame.
This – the data frame – is where humans come in. The algorithm might not understand that an offender has a solid support system and a plan to seek help, for example, but a judge could show leniency based on these ideas.
In order to live in this hybrid paradigm we have to operate under the assumption that there is no inherent bias in the criminal justice system and that all judges operate without bias or wrongful discrimination. We’ll hold for applause.
The data used to fuel these black box systems is based on real cases. In order to create “clean data” we’d have to let algorithms make all the decisions in an untouched legal environment.
The framing trap basically says that, no matter how we look at a problem, machine learning can’t model things that have no mathematical basis. We need an additional frame in order to overcome this problem: a socio-technical frame. The proposed socio-technical frame takes those decisions made by humans and human institutions and encapsulates them as part of the algorithms overall design, thus addressing inherent bias.
And that brings us to the final trap: the “Portability Trap.” The others are small potatoes compared to this one. Portability is a deeply-ingrained measure of “cleanliness” in code. Developers pride themselves on writing “clean code” because it means they’ve found the most effective way to handle a difficult problem.
Basically, a good programmer makes a “box” and the client decides what to put in it. Think about this: you don’t order 10 boxes in the shape of a coffee cup to ship your “world’s greatest” mug collection. You get a box that’ll fit them all. And you can reuse that box later to store books, or human heads. But that’s the opposite of what the justice system should be doing. Determining whether a person should stay in jail until they’re proven innocent, or how many years they should stay if they’re found guilty, should take more than just consideration of the minimum amount of data a black box system can parse.
As the researchers put it:
Consider an automated resumé screen: we might be less concerned with false negatives than false positives because there is more filtering at the back end (the interview itself). Where false negatives end the process entirely, closing out particular candidates, the consequence of false positives is a little extra work for the employer. In the criminal justice context, however, we might be most concerned about equalizing false positives, which result in keeping people locked up and where disparities will further entrench a minority-dominated prisoner underclass
We need algorithms designed for the purpose when humans’ freedoms are being decided, not clean code ready to make whatever predictions an end-user wants.
Algorithms reinforce existing bias and introduce new biases if they don’t have socio-technical models to prevent each of the aforementioned traps. And it’s a safe bet to assume that none of the algorithms in use in US courtrooms have such a model in place because, as far as we know, they don’t exist yet.
You might be wondering how we’re supposed to fix the problem. The solution is easier than you think, as the researchers wrote:
In a standard computer science paper, this is where we would suggest technical solutions. But guided by our analysis, our main proposed “solution” is a focus on the process of determining where and how to apply technical solutions.
The researchers believe the solution involves critical evaluation and increased interaction between disciplines. Here at TNW, it seems to us, that the easiest way to fix the problem, for now, is by acknowledging algorithms in the criminal justice system are all probably biased, and removing them from our courthouses.
Edit 13:06 CST 2/7/2019: Added context. The “abandon algorithms” takeaway is the author’s own, and shouldn’t be confused with the intent of the research discussed in this piece.