Artificial intelligence offers us an opportunity to amplify service and the integration of technology in everyday lives many times over. But until very recently, there remained a significant barrier in how sophisticated the technology could be. Without a complete understanding of emotion in voice and how AI can capture and measure it, inanimate assistants (voice assistants, smart cars, robots and all AI with speech recognition capabilities) would continue to lack key components of a personality. This barrier makes it difficult for an AI assistant to fully understand and engage with a human operator the same way a human assistant would.
This is starting to change. Rapid advances in technology are enabling engineers to program these voice assistants with a better understanding of the emotions in someone’s voice and the behaviors associated with those emotions. The better we understand these nuances, the more agile and emotionally intelligent our AI systems will become.
A vast array of signals
Humans are more than just “happy”, “sad” or “angry”. We are a culmination of dozens of emotions across a spectrum represented by words, actions, and tones. It’s at times difficult for a human to pick up on all of these cues in conversation, let alone a machine.
But with the right approach and a clear map of how emotions are experienced, it is possible to start teaching these machines how to recognize such signals. The different shades of human emotion can be visualized according to the following graphic:
The result is more than 50 individual emotions categorized under love, joy, surprise, anger, sadness, and fear. Many of these emotions imply specific behaviors and are highly situational – meaning it is very difficult to differentiate. That’s why it’s so important for emotion AI to recognize both sets of patterns when assigning an emotional state to a human operator.
Recognizing emotions in voice
Regardless of how advanced technology has become, it is still in the early stages. Chatbots, voice assistants, and automated service interfaces frequently lack the ability to recognize when you are angry or upset, and that gap has kept AI from filling a more substantial role in things like customer service and sales.
The problem is that words—the part of the conversation that AI can quantify and evaluate—aren’t enough. It’s less about what we say and more about how we say it. Studies have been conducted showing that the tone or intonation of your voice is far more indicative of your mood and mental state than the words you say.
Emotional prosody, or the tone of voice in speech, can be conveyed in a number of ways: the volume, speed, timbre, pitch, or the pauses used in the speech. Consider how you can recognize when someone is being sarcastic. It’s not the words—it’s the elongation of certain words and the general tone of the statement. Even further are the different ways in which prosody impacts speech: the words, phrases, and clauses implemented, and even the non-linguistic sounds that accompany speech.
To better understand the data in speech that isn’t related to linguistic or semantic information, there is behavior signal processing, a new field of technology that is designed to detect information encoded in human voice. Combining the best of AI engineering technology and behavioral science, this new field aims to fully interpret human interactions and the baselines of communication in voice.
It works by gathering a range of behavior signals – some overt and others less so. It draws on emotions, behaviors and perceived thoughts, ideas and beliefs drawn from data in speech, text, and metadata about the user to identify emotional states. Humans are not 0’s and 1’s. Their emotions are encoded from dozens of diverse sources. This requires a system that can observe, communicate and evaluate data from several sources simultaneously and respond in kind.
Designing better interfaces between machines and humans
Already, businesses are leveraging the insights provided by this new technology to better evaluate and utilize unstructured data in their organizations. Call recordings, chat histories, and support tickets are now providing a foundation upon which large organizations can better understand what their customers are feeling when they reach out and how those emotions ultimately influenced their decisions.
This opens a new avenue to understand the context of customer interactions. Historically, customers and prospects are evaluated through the prism of a human agent. Whether customer service or sales, the individual would interact with them and then make notes on how they are feeling or responding. This would need to be written in a structured format so it could be further evaluated in the future.
Today’s AI systems are making it possible to reference primary data – the actual responses given by customers and prospects to better understand what they need and why they need it. This level of insight is exponentially more powerful than it has been in the past and continues to evolve.
As a result of this, the future is bright for AI assistants. Not only will businesses be better able to understand and respond to consumer needs; so too will the machines already implemented in homes and offices around the globe. Smartphone and personal voice assistant devices will develop a more nuanced understanding of the context and behavior driving the response of the human operator.
The shades of emotion in the human voice are being decoded and mapped in a way that has never been done before, and it’s providing the foundation for the next generation of emotionally intelligent AI. This is the future of human and machine interaction and it is developing faster than ever before.
TNW Conference 2019 is coming! Check out our glorious new location, inspiring line-up of speakers and activities, and how to be a part of this annual tech extravaganza by clicking here.