A few weeks ago a story broke revealing that the New Orleans Police Department had been using a predictive policing tool supplied by CIA-backed data mining company, Palantir, to identify gang members. The software identified ties between gang members and non-gang members, analyzed criminal histories, crawled social media, and finally, predicted the likelihood that individuals would commit violence. Predictably, outrage ensued. But this use of predictive analytics to identify risk is hardly new: it’s been used in numerous US cities, in counterintelligence operations, and — here’s the twist — in schools.
‘Risk identification’ has become a highly in-vogue term in the American public school system. Districts want to decrease dropout rates, and a slew of data-driven tools have cropped up that use predictive analytics derived from questionably acquired data in order to identify at-risk students. Preventing drop out, like preventing crime, is a noble and worthy goal. But the reason that predictive policing has inspired outrage, is that algorithms tend to perpetuate systemic bias, and only work through collecting swaths of data points — data that people may not know is being collected.
The rise of predictive analytics in institutions such as schools and criminal justice systems raises a series of ethical questions which I’ve outlined below. But the fact is these algorithms are here to stay — and, I argue, that’s a good thing. The questions they raise — about racism, data ownership, and the ethics of predicting crime at all — are ones we should have been examining for decades.
1. Who owns a minor’s data?
During a 2015 congressional hearing on how emerging technologies affect student privacy, a representative asked for a summary of how much data is collected on students by the time they reach graduate school. Joel Reidenberg, director of the Center on Law & Information Policy at Fordham Law School, quickly answered, “Just think George Orwell, and take it to the nth degree.”
It’s not illegal for schools to collect data — from grades, to test scores, to internet searches, to behavioral notes — but many parents are extremely uncomfortable with the idea of not being informed about precisely what data is being collected, and more importantly, how it’s being used. In fact, in 2012 parents found out that inBloom, a $100M corporation, was collecting and sharing student data with vendors and other third parties. Mass outrage ensued, and every single state and district responded by pulling out of inBloom contracts, resulting in the closure of the company in 2014.
Since then, though, companies such as Hoonuit and Microsoft have quietly stepped in to serve school districts looking to decrease dropout rate. In fact, the federal government has actually mandated that every state collect student data from preschool onwards in a longitudinal data system. The data in this repository includes medical information, survey responses, child services, the criminal justice system, and health departments. Under the Family Educational Rights and Privacy Act (FERPA) medical and counseling records included in education records are not protected by HIPAA, meaning that sensitive mental and physical information can be shared with third parties without parental consent.
2. Does the algorithm’s ingrained systemic bias help or harm at-risk students?
Algorithms predict future behavior based on precedent. But when precedent shows higher percentages of at-risk behavior by minorities, the algorithm then predicts that minorities are more at risk than non minorities. This is not, however, necessarily an accurate reading. A recent study found that not every crime committed has an equal chance of being recorded by police — for example, crimes in areas that are heavily patrolled are more likely to be reported than crimes in (high income) neighborhoods with little patrol.
This same logic applies to schools: ingrained racism and other forms of prejudice cause teachers (who are not infallible) to report certain behaviors for one kid more than those same behaviors for another. This teaches the algorithm to mimic those same biases, thus perpetuating systemic bias.
3. Is it ethical to intervene based on data-driven predictions?
This is the question at the heart of the controversy over police using predictive analytics. The idea of predicting crime before it’s ever happened is decidedly dystopian — after all, at what point does a crime become a crime? That said, intervening to help students succeed is a much more noble goal than punishing a future criminal.
If you misidentify a student as ‘at risk’ and try and help them, it’s hard to imagine this backfiring in the same way that tracking potential criminals could. The worst case scenario is a child feeling upset at being placed in an intervention program when it’s not necessarily needed.
Predictive analytics is not the beast we think it is
The reality is that all prevention and punishment systems have always been predictive. The parole system, for instance, is essentially a qualitative assessment of whether or not a person is likely to commit a crime again (a predictive assessment). Similarly, intervention systems in schools have identified “at-risk” students through profiling for decades — a system rife with bias. Predictive analytics just renders these already existing systems more transparent. And people are upset because they’re finally viewing the ugliness that is and has always been predictive justice.
Predictive analytics does reduce people to a set of data points. But we humans do the same, we just also tend to give more weight to the wrong data points — such as appearance and background. This system is ugly — but in lieu of a wholly different structure, the transparency offered by predictive analytics is desirable. If the algorithm is racist, we can fix that much more easily than we can fix a racist person.
The real drawback of using analytics is data privacy — an issue that is at the heart of national debates occurring today. When you’re dealing with children’s data, it’s important to know how districts are using and distributing the data they’re collecting — particularly when they don’t use a proprietary system (we’ve all seen what can happen with third party data breaches!).
There will be data breaches, and there will be companies that illegally steal information from predictive analytics systems. But the cost of making data-driven decisions is outweighed by the benefits of being able to examine, in qualitative ways, how we make predictions about human behavior.
Get the TNW newsletter
Get the most important tech news in your inbox each week.