This article was published on October 26, 2017

Researchers built an AI that uses Google’s own tools to crack reCAPTCHA


Researchers built an AI that uses Google’s own tools to crack reCAPTCHA

Earlier this year Google boasted its security AI has smartened up so much it no longer requires you to tick a box in its reCAPTCHA system to know you’re not a bot. But it appears the company might have celebrated a bit prematurely.

A group of researchers from the University of Maryland have developed a new algorithm, called unCAPTCHA, that is capable of defeating the reCAPTCHA system with a mind-boggling success rate of 85 percent. The method exploits a vulnerability in the audio version of reCAPTCHA to accomplish such high consistency.

The solution involves using browser automation software to parse out the necessary elements and identify spoken numbers. The next step is to pass these numbers programmatically with the view to fooling target sites into thinking their bot is a human.

“Specifically, unCAPTCHA targets the popular site Reddit by going through the motions of creating a new user, although unCAPTCHA stops before creating the user to mitigate the impact on Reddit,” the researchers explain.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

To make this happen, the AI abuses several known flaws in Google’s security system to lower reCAPTCHA’s “suspicion level significantly.”

The most impressive part is that the researchers used a series of audio transcription services to defeat the system. Curiously, these services involved IBM, Google Cloud and Speech Recognition, Sphinx, Wit-AI, and Bing Speech Recognition. So in a way, the researchers turned Google’s own tech against itself.

Following disclosing this flaw to the Big G back in April, the researchers point out the company has since added some additional protections that limit unCAPTCHA’s success rate.

“For instance, Google has also improved their browser automation detection,” the documentation reads. “This may lead to Google sending odd audio segments back to the end user. Additionally, we have observed that some audio challenges include not only digits, but small snippets of spoken text.”

The researchers have since released the full proof of concept in a paper you can see in more detail here [PDF]. They have also shared the slides from a proof-of-concept presentation they gave at the  Usenix WOOT ’17 conference in Vancouver.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with