Human-centric AI news and analysis

Google’s new open-source AI model understands Indic languages better

Screenshot 2020-12-17 at 11.24.08 AM

Google’s various products, such as Search and Assistant, are already available in India in multiple local languages. The company is now turning to a new AI to potentially make more of its offerings accessible to Indic language speakers — more specifically, it’s using a technology called MuRIL.

At its virtual event today, the Big G unveiled a new language model called Multilingual Representations for Indian Languages (MuRIL). This is the first model to support interoperation between 16 different Indic languages.  

That includes Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Tamil, Telugu, and Urdu.

While MuRIL is based on Google’s own BERT (Bidirectional Encoder Representations from Transformers) model, researchers claim it’s more efficient for Indian languages.


Partha Talukdar, a researcher at Google India, said that the new model understands the context of statements in local languages better. 

For example, the previous model understood the following Hindi statement as a negative emotion: a Hindi statement “Accha hua account bandh ho gaya” (It’s good that the account got closed). However, the new model correctly predicts that the statement is positive.

Users in India often use their English language keyboard to type in local languages — like the sentence above. For that, researchers have included support for transliteration detection in other languages while using the Roman script.

Google is making this model open-source for other researchers and startups to use.

Currently, MuRIL is not embedded in any of Google’s products. However, based on inputs from researchers and programmers, it aims to include this model into its offerings in the future for better accuracy.

You can learn more and check out MuRIL’s code here.

Published December 17, 2020 — 06:04 UTC