New MIT algorithm automatically deciphers lost languages

A new AI system can automatically decipher a lost language that’s no longer understood — without knowing its relationship to other languages.

Researchers at MIT CSAIL developed the algorithm in response to the rapid disappearance of human languages. Most of the languages that have existed are no longer spoken, and at least half of those remaining are predicted to vanish in the next 100 years.

The new system could help recover them. More importantly, it could preserve our understanding of the cultures and wisdom of their speakers.

[Read: What audience intelligence data tells us about the 2020 US presidential election]

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The algorithm works by harnessing key principles from historical linguistics, such as the predictable ways in which languages use sound substitutions. The researchers give the example of a word with a “p” in a parent language possibly changing to a “b” in its descendent, but probably not to a “k” due to the difference in pronunciation.

These kinds of patterns are then turned into computational constraints. This allows the model to segment words from an ancient language and map them to a related language.

The algorithm can also identify different language families. For example, their method suggested that Iberian is not related to Basque, supporting recent scholarship.

The project was led by MIT Professor Regina Barzilay, who last month won a $1 million award from the world’s largest AI society for her pioneering work on drug development and breast cancer detection.

She now wants to expand the work to identifying the semantic meaning of words — even if we don’t know how to read them.

“For instance, we may identify all the references to people or locations in the document which can then be further investigated in light of the known historical evidence,” Barzilay said in a statement.

“These methods of ‘entity recognition’ are commonly used in various text processing applications today and are highly accurate, but the key research question is whether the task is feasible without any training data in the ancient language.”

You can read the full research study here.

Story by Thomas Macaulay

Managing editor

Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he e (show all) Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Get the TNW newsletter

Get the most important tech news in your inbox each week.

New MIT algorithm automatically deciphers lost languages

Get the TNW newsletter

Germany’s Vsquared is taking on Atomico and Balderton on their London home turf

What SpaceX’s record IPO really means for the OpenAI and Anthropic listings behind it

Discover TNW All Access

AI bubble fears are spreading, even as SpaceX readies the biggest IPO ever

Google DeepMind’s TacticAI can predict football plays 8 seconds before they happen. Palmeiras is the first to use it.