Opening the black box. Reducing the massive power consumption it takes to train deep learning models. Unlocking the secret to sentience. These are among the loftiest outstanding problems in artificial intelligence. Whoever has the talent and budget to solve them will be handsomely rewarded with gobs and gobs of money.
But there’s an even greater challenge stymieing the machine learning community, and it’s starting to make the world’s smartest developers look a bit silly. We can’t get the machines to stop being racist, xenophobic, bigoted, and misogynistic.
Nearly every big tech outfit and several billion-dollar non-profits are heavily invested in solving AI‘s toxicity problem. And, according to the latest study on the subject, we’re not really getting anywhere.
The prob: Text generators, such as OpenAI’s GPT-3, are toxic. Currently, OpenAI has to limit usage when it comes to GPT-3 because, without myriad filters in place, it’s almost certain to generate offensive text.
In essence, numerous researchers have learned that text generators trained on unmitigated datasets (such as those containing conversations from Reddit) tend towards bigotry.
It’s pretty easy to reckon why: because a massive percentage of human discourse on the internet is biased with bigotry towards minority groups.
Background: It didn’t seem like toxicity was going to be an insurmountable problem back when deep learning exploded in 2014.
We all remember that time Google’s AI mistook a turtle for a gun right? That’s very unlikely to happen now. Computer vision’s gotten much better in the interim.
But progress has been less forthcoming in the field of NLP (natural language processing).
Simply put, the only way to stop a system such as GPT-3 from spewing out toxic language is to block it from doing so. But this solution has its own problems.
What’s new: DeepMind, the creators of AlphaGo and a Google sister company under the Alphabet umbrella, recently conducted a study of state-of-the-art toxicity interventions for NLP agents.
The results were discouraging.
Per a preprint paper from the DeepMind research team:
We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the REALTOXICITYPROMPTS dataset, this comes at the cost of reduced LM (language model) coverage for both texts about, and dialects of, marginalized groups.
Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions — highlighting further the nuances involved in careful evaluation of LM toxicity.
The researchers ran the intervention paradigms through their paces and compared their efficacy with that of human evaluators.
A group of paid study participants evaluated text generated by state-of-the-art text generators and rated its output for toxicity. When the researchers compared the human’s assessment to the machine’s, they found a large discrepancy.
AI may have a superhuman ability to generate toxic language but, like most bigots, it has no clue what the heck it’s talking about. Intervention techniques failed to accurately identify toxic output with the same accuracy as humans.
Quick take: This is a big deal. Text generators are poised to become ubiquitous in the business world. But if we can’t make them non-offensive, they can’t be deployed.
Right now, a text-generator that can’t tell the difference between a phrase such as “gay people exist” and “gay people shouldn’t exist,” isn’t very useful. Especially when the current solution to keeping it from generating text like the latter is to block it from using any language related to the LGBTQ+ community.
Blocking references to minorities as a method to solve toxic language is the NLP equivalent of a sign that says “for use by straight whites only.”
The scary part is that DeepMind, one of the world’s most talented AI labs, conducted this study and then forwarded the results to Jigsaw. That’s Google’s crack problem-solving team. It’s been unsuccessfully trying to solve this problem since 2016.
The near-future doesn’t look bright for NLP.
You can read the whole paper here.