Earlier today, Microsoft Research released a blog post promising that at Interspeech 2011, an event that is underway, the company would unveil a ‘breatkthrough’ in speech recognition.
Importantly, the development does not deal with speech recognition that requires the user to ‘train’ the system, but instead involves “real-time, speaker-independent, automatic speech recognition.” In other words, true recognition of human speech.
Microsoft claims that it has managed to “dramatically improve the potential” of this sort of technology becoming commercially functional. Through the use of deep neural networks, the company has managed to improve the accuracy of ‘on the go’ speech recognition, something that is a near holy grail of technology. How the team managed to execute the breakthrough is exceptionally technical, but we will not summarize it here because it is a topic that requires extensive background knowledge to follow. Microsoft’s blog post has all the information, if you’re curious.
In regards to the results of what Microsoft Research has built, this is the crucial revelation: “The subsequent benchmarks achieved an astonishing word-error rate of 18.5 percent, a 33-percent relative improvement compared with results obtained by a state-of-the-art conventional system.” The company claims that this has “brought fluent speech-to-speech applications much closer to reality.”
That said, this remains very much a research project. The company made that abundantly clear in the discussion of its progress.
This project is not simply an interesting technical problem, but something that Microsoft desperately needs solved. The company is forging ahead with what it calls Natural User Interface integration (think the Kinect, voice to text, and so forth), and so it needs a better voice solution. The company must have its eyes on its Research division, pushing them towards a commercially viable product that can be integrated across the world of its products.
For now, this is one step, albeit an important one.