This article was published on July 4, 2017

Biased data teaches algorithms how to discriminate

Biased data teaches algorithms how to discriminate Image by: Stuart Seeger
Tristan Greene
Story by

Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Math is a tool that doesn’t discriminate. There’s no bias in it; the numbers either add up or they don’t. Algorithms depend on math, but they’re data driven — sometimes the information being fed into one is incorrect or doesn’t represent the actual goals of the algorithm.

Cathy O’Neil, the author of Weapons of Math Destruction, cautions us against trusting the data being fed into our judicial systems:

In the video she explains how mistakes made by algorithms cause existing problems to become even worse:

And what ProPublica found was the compass model, which is one version of a recidivism model, made mistakes by sending people to prison longer, that kind of mistake, twice as often for African-American defendants as for white defendants, at least in Broward County Florida. There’s another kind of mistake you can make which is: you look like you’re not coming back, you look low-risk but you actually do come back that kind of risk, that kind of mistake, was made twice as often for white defendants as for African-American defendants.

The problem with using algorithms in police work is: there’s no such thing as crime data. What police use is arrest data, and so far this hasn’t worked out well for minorities.

Statistically a Black person is four times more likely to be arrested for the same crime as a white person. Politifact breaks this down in a chart detailing the homicide arrests for 2012 and 2013:






Rate per 100K

























These numbers represent arrests, not crimes committed – for every 100,000 Caucasians, just shy of five are arrested for homicide. Blacks however represent about 34 arrests per 100,000. This means a Black person is far more likely to be arrested for murder than a Caucasian, even though – based on population – you’re far more likely to be murdered by a Caucasian.

It gets even worse when you look at incarceration. Blacks, who make up 13-percent of the population on the outside, comprise 38-percent of the prison population. The ACLU reports that sentences given to black males are 20-percent longer than those given to Caucasians.

The data used to power these algorithms — the ones that are supposed to predict who will commit a crime next, and which suspects are going to become repeat offenders — is flawed to begin with.

We can’t just make it up, that’s not how data-powered algorithms work. We’re using arrest records, and length of sentence, number of arrests, and which were repeat offenses — statistics that are supposed to provide a baseline. We’re getting the same results with the algorithms as we were before, which should come as a surprise to no one.

The government is also legislating political concerns into the data as well. A bill proposed in 2015 to regulate the algorithms used in courts went so far as to add in additional clauses. One such stipulation: arrests with the word ‘fentanyl’ in a defendant’s record be weighted differently when it comes to sentencing, because that’s what congress wants.

It’s all very discouraging. The algorithms have the potential to eliminate human bias, but not when it’s based upon data that we already know is flawed.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with