AI has become adept at recreating, altering, and restoring human speech. But as the replicas become indistinguishable from the real, fears about the tech are growing.
Alex Serdiuk has a unique understanding of both the opportunities and threats.
As the co-founder and CEO of AI startup Respeecher, Serdiuk has won an Emmy for creating a deepfake Richard Nixon, developed voice clones for speech disabilities, and de-aged Mark Hamill’s vocal cords for The Mandalorian.
Yet Serdiuk has also seen synthetic media at its worst. The CEO and his company are based in Ukraine, which has been the target of deepfake disinformation.
In March, a manipulated video of Ukrainian President Volodymyr Zelensky circulated on social media. The clip showed a digitally-rendered Zelensky telling soldiers to surrender to Russia.
The impact, however, was minimal.
“This deepfake was done so poorly — like many things that Russians do — so it wasn’t convincing,” Seriduk tells TNW.
“And our nation is smart. We have a belief in what’s going on in our government, and if someone says that our president gave up, most of the people would check those facts — especially because the deepfake was so bad.”
The Zelensky deepfake is so bad it won't convince any Ukrainians to "lay down their arms." But enough of these could spread doubt about the authenticity of real videos in the future https://t.co/DZ7IuYsPoT pic.twitter.com/MGNxvyw8GE
— Alec Luhn (@ASLuhn) March 17, 2022
Nonetheless, the clip demonstrated synthetic media’s potential to make us question what we see — and hear.
Indeed, fake sounds can be more convincing than fake sights.
Editing reality
Seriduk believes synthetic voices can avoid the uncanny valley more smoothly than artificial visuals.
He adds that this realism can benefit society. Respeecher, for instance, has developed voice replacement tech for people who have undergone a laryngectomy.
In trials, the system created a natural-sounding voice while preserving the user’s articulation.
Sonantic, an AI startup, produced another powerful example.
In 2021, the company recreated Val Kilmer‘s voice after throat cancer treatment left the actor unable to speak clearly.
Sonantic CEO Zeena Qureshi said the project showed the altruistic potential of the approach.
“I spent nine years helping children with autism learn how to use their voice as a better instrument for communication,” she recalled in a statement.
“The project with Val demonstrated again how empowering it can be when people overcome challenges with speaking.”
However, other uses of speech synthesis have caused concern.
Voicing complaints
In 2021, a documentary about Anthony Bourdain sparked a heated debate about deepfakes.
In an interview, director Morgan Neville revealed that AI had recreated the late chef’s voice in the film. The synthetic dialogue was comprised of words Bourdain had written but never said.
Critics felt the move was duplicitous and lacked consent from Bourdain — who was famously obsessed with authenticity.
Neville later said he’d received approval from Bourdain’s next of kin. But Ottavia Bourdain, the chef’s widow, disputed this claim.
“I certainly was NOT the one who said Tony would have been cool with that,” she said in tweet.
I certainly was NOT the one who said Tony would have been cool with that. https://t.co/CypDvc1sBP
— Ottavia (@OttaviaBourdain) July 16, 2021
Serdiuk says Respeecher wouldn’t permit such work.
The company’s ethics statement prohibits deceptive uses of synthetic speech. The company has further pledged to never use the voice of a private person or actor without permission.
In “a handful” of cases, however, the voices of historical figures have been used to show the tech’s potential.
Respeecher is also developing two technical defenses: a synthetic speech detector and audio watermarking.
Ultimately, voice cloning will remain another tool that we can use for both good and bad. Serdiuk hopes the safeguards stop the harms from overshadowing the benefits.
Get the TNW newsletter
Get the most important tech news in your inbox each week.