
This is an excerpt from a long interview between an anonymous data scientist and Logic Magazine about AI, deep learning, FinTech, and the future, conducted in November 2016.
LOGIC: One hears a lot about algorithmic finance and things like robo-advisers. And Iβm wondering, is it over-hyped?
DATA SCIENTIST: I would say that robo-advisers are not doing anything special.
Itβs AI only in the loosest sense of the word. Theyβre not really doing anything advanced β theyβre applying a formula. And itβs a reasonable formula, itβs not a magic formula, but theyβre not quantitatively assessing markets and trying to make predictions. Theyβre applying a formula about whatever stock and bond allocations to make β itβs not a bad service, but itβs super hyped.
Thatβs indicative of a bubble in AI that you have something like that where youβre like, βItβs AI!β and people are like, βOkay, cool!β
Thereβs a function thatβs being optimized β which is, at some level, what a neural net is doing. But itβs not really AI.
I think one of the big tensions in data science that is going to unfold in the next ten years involves companies like SoFi, or Earnest, or pretty much any company whose shtick is, βWeβre using big data technology and machine learning to do better credit score assessments.β
I actually think this is going to be a huge point of contention moving forward.
I talked to a guy who used to work for one of these companies. Not one of the ones I mentioned, a different one. And one of their shticks was, βOh, weβre going to use social media data to figure out if youβre a great credit risk or not.β And people are like, βOh, are they going to look at my Facebook posts to see whether Iβve been drinking out late on a Saturday night? Is that going to affect my credit score?β
And I can tell you exactly what happened, and why they actually killed that. Itβs because, with your social media profile, they know your name, they know the name of your friends, and they can tell if youβre black or not. They can tell how wealthy you are, they can tell if youβre a credit risk. Thatβs the shtick.
And my consistent point of view is that any of these companies should be presumed to be incredibly racist unless presenting you with mountains of evidence otherwise.
Anybody that says, βWeβre an AI company thatβs making smarter loansβ: racist. Absolutely, 100%.
I was actually floored, during the last Super Bowl I saw this SoFi ad that said, βWe discriminate.β I was just sitting there watching this game like I cannot believe it β itβs either they donβt know, which is terrifying, or they know and they donβt give a shit, which is also terrifying.
I donβt know how that court case is going to work out, but I can tell you in the next ten years, thereβs going to be a court case about it. And I would not be surprised if SoFi lost for discrimination. And in general, I think itβs going to be an increasingly important question about the way that we handle protected classes generally, and maybe race specifically, in data science models of this type.
Because otherwise, itβs like: okay, you canβt directly model if a person is black. Can you use their zip code? Can you use the racial demographics for the zip code? Can you use things that correlate with the racial demographics of their zip code? And at what level do you draw the line?
And we know what weβre doing for mortgage lending β and the answer there is, frankly, as a data scientist, a little bit offensive β which is that we donβt give a shit where your house is. We just lend.
Thatβs what Rocket Mortgages does. Itβs a fucking app, and youβre like, βHow can I get a million dollar loan with an app?β And the answer is that they legally canβt tell where your house is. And the algorithm that you use to do mortgages has to be vetted by a federal agency.
Thatβs an extreme, but that might be the extreme we go down, where every single time anybody gets assessed for anything, the actual algorithm and the inputs are assessed by a federal regulator. So maybe thatβs going to be what happens.
I actually view it a lot like the debates around divestment. You can say, βOkay, we donβt want to invest in any oil companies,β but then do you want to invest in things that are positively correlated with oil companies, like oilfield services companies? What about things that in general have some degree of correlation? How much is enough?
I think itβs the same thing where itβs like, okay, you canβt look at race, but can you look at correlates of race? Can you look at correlates of correlates of race? How far do you go down before you say, βOkay, thatβs okay to look at?β
Iβm reminded a bit of Cathy OβNeilβs new book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (2016). One of her arguments, which it seems like youβre echoing, is that the popular perception is that algorithms provide a more objective, more complete view of reality, but that they often just reinforce existing inequities.
Thatβs right. And the part that I find offensive as a mathematician is the idea that somehow the machines are doing something wrong.
We as a society have not chosen to optimize for the thing that weβre telling the machine to optimize for. Thatβs what it means for the machine to be doing illegal things. The machine isnβt doing anything wrong, and the algorithms are not doing anything wrong. Itβs just that theyβre literally amoral, and if we told them the things that are okay to optimize against, they would optimize against those instead.
Itβs a frightening, almost Black Mirror-esque view of reality that comes from the machines, because a lot of them are completely stripped of β not to sound too Trumpian β liberal pieties. Itβs completely stripped.
Theyβre not βpolitically correct.β
They are massively not politically correct, and itβs disturbing.
You can load in tons and tons of demographic data, and itβs disturbing when you see percent black in a zip code and percent Hispanic in a zip code be more important than borrower debt-to-income ratio when you run a credit model.
When you see something like that, youβre like, βOoh, thatβs not good.β Because the frightening thing is that even if you remove those specific variables, if the signal is there, youβre going to find correlates with it all the time, and you either need to have a regulator that says, βYou can use these variables, you canβt use these variables,β or, I donβt know, we need to change the law.
As a data scientist, I would prefer if that did not come out in the data. I think itβs a question of how we deal with it. But I feel sensitive toward the machines, because weβre telling them to optimize, and thatβs what theyβre coming up with.
Theyβre describing our society.
Yeah. Thatβs right, thatβs right. Thatβs exactly what theyβre doing. I think itβs scary. I can tell you that a lot of the opportunity those FinTech companies are finding is derived from that kind of discrimination because if you are a large enough lender, you are going to be very highly vetted, and if youβre a very small lender youβre not.
Take SoFi, for example. They refinance the loans of people who went to good colleges. They probably did not set up their business to be super racist, but I guarantee you they are super racist in the way theyβre making loans, in the way theyβre making lending decisions.
Is that okay? Should a company like that exist?
I donβt know. I can see it both ways. You could say, βTheyβre a company, theyβre providing a service for people, people want it, thatβs good.β But at the same time, we have such a shitty legacy of racist lending in this country. Itβs very hard not to view this as yet another racist lending policy, but now itβs got an app. I donβt know.
I just think that there is going to be a court case in the next ten years, and whatever the result is, itβs going to be interesting.
You can find the full interview here. Logic is a magazine about technology that comes out three times a year. To learn more, visit logicmag.io. You can also find them on Facebook, or follow them on Twitter here:
Get the TNW newsletter
Get the most important tech news in your inbox each week.