Google Translate is one of the company’s most used products. It helps people translate one language to another through typing, taking pics of text, and using speech-to-text technology. Now, the company’s launching a new project called Translatotron, which will offer direct speech-to-speech translations – without even using any text.
In a post on Google’s AI blog, the team behind the tool explained that instead of using speech-to-text and then text-to-speech to convert voice, it relied on a new model (which runs on a neural network) to develop the new system.
“Dubbed Translatotron, this system avoids dividing the task into separate stages, providing a few advantages over cascaded systems, including faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to retain the voice of the original speaker after translation, and better handling of words that do not need to be translated (e.g., names and proper nouns),” the Google research team wrote in the blog post.
Translatotron can also preserve the characteristics of the voice of the speaker when translating from one language to another. This could be really useful to sound editors who dub movies and TV shows.
The researchers have admitted that translations from the new model are not as precise as traditional models, but they’re confident the accuracy of the new model will soon improve.
Considering this still a model (and theres not even a demo available yet), chances are Google will take a while to implement the new system in consumer-grade solutions. I, for one, am looking forward to trying it out though.