Researchers show how data science techniques can find Twitter ‘amplification bots’


Researchers from Duo Security, an authentication services company owned by Cisco Systems, have published a blog post that explains how to methodically identify “amplification bots.” These are defined as automated Twitter accounts that purely exist to artificially amplify the reach of content through retweets and likes.

The article, “Anatomy of Twitter Bots: Amplification Bots,” was written by researchers Jordan Wright and Olabode Anise. It expands upon their talk at the 2018 Black Hat USA conference, “Don’t @ Me: Hunting Twitter Bots at Scale.”

The pair created a dataset of 576 million posts and filtered it to show those that had over 50 retweets and attempted to define what it considers to be normal behavior. Through their analysis, they found that found that half of the tweets have a 2:1 ratio of likes-to-retweets. Around 80 percent had at least more likes than retweets (greater than 1:1 ratio).

A tweet that’s likely to be artificially amplified will flip that on its head and have more retweets than likes. One example highlighted in the article had 6 retweets for every one like. The pair deems a tweet to be artificially inflated if it has a retweet-to-like ratio that’s greater than five.

The pair also argue that timing plays an important role in identifying phony accounts, with a genuine user’s tweets being in chronological order. A fake account, on the other hand, is more likely to take a more scattered approach to posting.

Using these clues, the researchers created a methodology to determine, with some degree of confidence, if an account is an amplification bot.

The first point is obvious: it retweets posts. A lot. If over 90 percent of an account’s posts are retweets, that’s a clue.

The next step is to analyse how many of these tweets are “amplified.” If at least half of them have a retweet-to-like ratio greater than 5:1, it’s a glaring clue.

The next step is to look at the timings of the tweets in order to count the number of “inversions,” or not in chronological order.

The pair claims to have identified over 7,000 amplification bots in just one day by using this methodology, but it’s entirely possible that this is the tip of the iceberg. In the paper, Wright and Anise explain that it’s impossible to identify accounts that amplify content through likes, as there’s no official Twitter API endpoint for capturing and recording likes.

Regardless, this is a problem for both security researchers and Twitter to tackle. Amplification bots sound harmless, but as we learned in 2016, can be used by a foreign adversary to shape public opinion. Retweets, as the authors of the blog post explain, don’t just affect how content spreads, but also its perceived credibility.

Amplification bots can also be used in influencer fraud, which is believed to cost the marketing industry $100 million annually.

Wright and Anise previously wrote about how to use similar data science techniques to identify fake followers on Twitter. To read about that, click here.

Read next: The creative trends dominating mainstream and independent media