This AI tool is smart enough to spot AI-generated articles and tweets

Researchers from Harvard University and MIT-IBM Watson Lab have created an AI-powered tool for spotting AI-generated text.

Dubbed Giant Language Model Test Room (GLTR), the system aims to detect whether a specific piece of text was generated by a language model algorithm. You can give the tool a spin here.

With AI and natural language generation models already employed to produce fake news and spread misinformation, GLTR has the potential to distinguish machine generated text from human-written text to a non-expert reader.

According to results shared by the researchers, GLTR improved the human detection-rate of fake text from 54 percent to 72 percent without any prior training.

The algorithm makes use of statistical word distributions in text to identify differences. The underlying premise is that if the text has been generated using a language model, then it would feature a more predictable string of words than when written by a human.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The sentences generated by AI text generators may be grammatically correct, but may not carry any actual meaning. GLTR works by identifying such statistical patterns over a sixty-word window — thirty words to each side of any given word in the text — and spotting the most predictable sequence of words.

Text passed from Jane Austen’s Love and Friendship

The words that are more statistically likely to appear after the preceding word in the text are highlighted in green. The less likely ones are in yellow and red, and the least likely words are marked purple.

The idea is that geniune text tends to have a healthy mix of words that contain yellows, reds and purples. If the highlighted text is mostly greens and yellows, it gives a strong indication that it could be machine generated.

For example, when I passed a variety of AI-generated text created using OpenAI’s Talk to Transformer, the results were invariably populated by greens and yellows:

Initiatives like GLTR initiatives can be valuable not only in detecting fake text, but also identifying Twitter bots that have been used to disrupt electoral processes in the US and elsewhere.

An already popular tool is the Botometer, which uses machine learning techniques to determine whether an account is operated by a human or by a software algorithm. The tool correctly identifies a bot account about 95 percent of the time.

Although none of these methods are fool-proof, they highlight the need for creating human-AI collaborative systems to collectively tackle socio-technological problems.