Scientists claim they can “teach” an AI moral reasoning by training it to extract ideas of right and wrong from texts.
Researchers from Darmstadt University of Technology (DUT) in Germany fed their model books, news, and religious literature so it could learn the associations between different words and sentences. After training the system, they say it adopted the values of the texts.
As the team put it in their research paper:
The resulting model, called the Moral Choice Machine (MCM), calculates the bias score on a sentence level using embeddings of the Universal Sentence Encoder since the moral value of an action to be taken depends on its context.
This allows the system to understand contextual information by analyzing entire sentences rather than specific words. As a result, the AI could work out that it was objectionable to kill living beings, but fine to just kill time.
Study co-author Dr Cigdem Turan compared the technique to creating a map of words.
“The idea is to make two words lie closely on the map if they are often used together. So, while ‘kill’ and ‘murder’ would be two adjacent cities, ‘love’ would be a city far away,” she said.
“Extending this to sentences, if we ask, ‘Should I kill?’ we expect that ‘No, you shouldn’t’ would be closer than ‘Yes, you should.’ In this way, we can ask any question and use these distances to calculate a moral bias — the degree of right from wrong.
Making a moral AI
Previous research has shown that AI can learn from human biases to perpetuate stereotypes, such as Amazon’s automated hiring tools that downgraded graduates of all-women colleges. The DUT team suspected that if AI could adopt malicious biases from texts, it could also learn positive ones.
They acknowledge that their system has some pretty serious flaws. Firstly, it merely reflects the values of a text, which can lead to some extremely dubious ethical views, such as ranking eating animal products a more negative score than killing people.
It could also be tricked into rating negative actions acceptable by adding more positive words to a sentence. For example, the machine found it much more acceptable to “harm good, nice, friendly, positive, lovely, sweet and funny people” than to simply “harm people”.
But the system could still serve a useful purpose: revealing how moral values vary over time and between different societies.
After feeding it news published between 1987 and 1997, the AI rated getting married and becoming a good parent as extremely positive actions. But when they fed it news from 2008 – 2009, these were deemed less important. Sorry kids.
It also found that values varied between the different types of texts. While all the sources agreed that killing people is extremely negative, loving your parents was viewed more positively in books and religious texts than in the news.
That textual analysis sounds like a much safer use of AI than letting it make moral choices, such as who a self-driving car should hit when a crash is unavoidable. For now, I’d prefer to leave those to a human with strong moral values — whatever they might be.
Get the Neural newsletter
Greetings Humanoids! Did you know we have a newsletter all about AI? You can subscribe to it right here.Follow @neural