AI ‘pre-cog’ predicts which Twitter users will spread disinformation

Researchers from the University of Sheffield have developed an AI system that detects which social media users spread disinformation — before they actually share it.

The team found that Twitter users who share content from unreliable sources mostly tweet about politics or religion, while those who repost trustworthy sources tweet more about their personal lives.

“We also found that the correlation between the use of impolite language and the spread of unreliable content can be attributed to high online political hostility,” said study co-author Dr Nikos Aletras, a lecturer in Natural Language Processing at the University of Sheffield.

[Read: Why AI is the future of home security]

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The team reported their findings after analyzing more than 1 million tweets from around 6,200 Twitter users.

They began by collecting posts from a list of news media accounts on Twitter, which had been classified as either trustworthy or deceptive in four categories: satire, propaganda, hoax, and clickbait.

They then used the Twitter public API to retrieve the most recent 3,200 tweets for each source, and filtered out any retweets to leave only original posts.

Next, they removed satirical sites such as The Onion that have humourous rather than deceptive purposes to produce a list of 251 trustworthy sources, such as the BBC and Reuters, and 159 unreliable sources, which included Infowars and Disclose.tv.

They then placed the roughly 6,200 Twitter users into two separate groups: those who have shared unreliable sources at least three times, and those who have only ever reposted stories from the trustworthy sites.

Finally, the researchers used the linguistic information in the tweets to train a series of models to forecast whether a user would likely spread disinformation.

Their most effective method used a neural model called T-BERT. The team says it can predict with 79.7% accuracy whether a user will repost unreliable sources in the future:

This demonstrates that neural models can automatically unveil (non-linear) relationships between a user’s generated textual content (i.e., language use) in the data and the prevalence of that user retweeting from reliable or unreliable news sources in the future

The team also performed a linguistic feature analysis to detect differences in language use between the two groups.

They found that users who shared unreliable sources were more likely to use words such as “liberal,” “government,” and “media,” and often referred to Islam or politics in the Middle East. In contrast, the users who shared trustworthy sources frequently tweeted about their social interactions and emotions, and often used words like “mood,” “wanna,” and “birthday.”

The researchers hope their findings will help social media giants combat disinformation.

“Studying and analyzing the behavior of users sharing content from unreliable news sources can help social media platforms to prevent the spread of fake news at the user level, complementing existing fact-checking methods that work on the post or the news source level,” said study co-author Yida Mu, a PhD student at the University of Sheffield.

You can read the full study in the journal PeerJ.

Story by Thomas Macaulay

Managing editor

Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he e (show all) Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Get the TNW newsletter

Get the most important tech news in your inbox each week.

AI ‘pre-cog’ predicts which Twitter users will spread disinformation

Get the TNW newsletter

Also tagged with

Microsoft says it’s developed ‘the most comprehensive spelling correction system ever made’

European Central Bank assembles ‘infinity team’ to identify GenAI applications

Discover TNW All Access

GPT-3 sucks at pick-up lines — here’s what that tells us about computer-generated language

How we taught Google Translate to stop being sexist