Google Research today announced new findings on how the search giant uses large amounts of data from the Web to improve its automatic speech recognition. Using large language models, anonymized queries on Google.com are used to improve Voice Search and data from the Web in general is analyzed to improve YouTube speech transcription. The full results are available in a seven-page paper titled “Large Scale Language Modeling in Automatic Speech Recognition (PDF).
The abstract should give you a better taste of what this is about:
Large language models have been proven quite beneficial for a variety of automatic speech recognition tasks in Google. We summarize results on Voice Search and a few YouTube speech transcription tasks to highlight the impact that one can expect from increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, language model size and amount of work and care put into integrating them in the lattice rescoring step we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points between 17% and 52% word error rate.
F**k it, we'll do it live!
Our biggest ever edition of TNW Conference is fast approaching! Join 10,000 tech leaders this May in Amsterdam.
Speech recognizers uses language models to assign probabilities to words being said, based on previous ones that have already been said. For example, if you’re saying “I’m going to go walk the…” then a good language model will assign a higher probability to the word “dog” than to the word “bog.”
Google uses an n-gram approach to language modeling (predicting the next word based on the previous n-1 words) because the company says it is well-suited to large amounts of data as “it scales gracefully” as the company gets more data. As you can see in the graph above, both word error rate (a metric Google uses to measure speech recognition accuracy) and search error rate (a metric Google uses to evaluate speech recognition effectiveness for search) decrease significantly with larger language models.
This is a perfect example of how Google improves its services by collecting data on you and the broader Web. As long as it stays anonymous, everyone wins.
Image credit: Petre Birlea