Early bird prices are coming to an end soon... ⏰ Grab your tickets before January 17

This article was published on February 9, 2021

Microsoft says it’s developed ‘the most comprehensive spelling correction system ever made’

The Speller100 system corrects errors in over 100 languages


Microsoft says it’s developed ‘the most comprehensive spelling correction system ever made’ Image by: MIXEvent

Microsoft has unveiled an AI system called Speller100 that corrects spelling in over 100 languages used in search queries on Bing.

We believe Speller100 is the most comprehensive spelling correction system ever made in terms of language coverage and accuracy,” the company said in a blog post.

Microsoft says Speller100 has improved quality in numerous low- and no-resource languages, such as Macedonian, Belarusian, and Pashto.
Credit: Microsoft
Microsoft says Speller100 has improved corrections in numerous low- and no-resource languages, such as Macedonian, Belarusian, and Pashto.

Bing previously provided high-quality spelling corrections for around two dozen languages. However, it didn’t have enough training data to work well on languages with little web presence and user feedback.

Speller100 overcomes these limitations by looking for similarities in large language families.

Germanic languages have many orthographic similarities.
Credit: Microsoft
The system uses orthographic similarities in language families such as Germanic.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

It also applies zero-shot learning to correct errors without needing extra language-specific labeled training data.

[Read: How much does it cost to buy, own, and run an EV? It’s not as much as you think]

Microsoft said it built around a dozen language family-based models to maximize the zero-shot benefit:

Imagine someone had taught you how to spell in English and you automatically learned to also spell in German, Dutch, Afrikaans, Scots, and Luxembourgish. That is what zero-shot learning enables, and it is a key component in Speller100 that allows us to expand to languages with very little to no data.

The system also reduces the need for human-labeled annotations by extracting text from web pages to generate common errors.

Microsoft designed noise functions to generate common errors of rotation, insertion, deletion, and replacement.
Credit: Microsoft
Speller100 uses noise functions to produce typical errors of rotation, insertion, deletion, and replacement.

“This text can easily be extracted through web crawling, and there is a sufficient amount of text for the training of hundreds of languages,” Microsoft said.

In tests, Speller100 reduced the number of pages with no reduced by up to 30%. It also increased the number of times users clicked on spelling suggestions from single digits to 67%.

Microsoft said shipping the system to Bing is just the first step. The company plans to add the tech to “many more” of its products in the near future.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with