Researchers from the University of Sheffield have developed an AI system that detects which social media users spread disinformation — before they actually share it.
The team found that Twitter users who share content from unreliable sources mostly tweet about politics or religion, while those who repost trustworthy sources tweet more about their personal lives.
“We also found that the correlation between the use of impolite language and the spread of unreliable content can be attributed to high online political hostility,” said study co-author Dr Nikos Aletras, a lecturer in Natural Language Processing at the University of Sheffield.
[Read: Why AI is the future of home security]
The team reported their findings after analyzing more than 1 million tweets from around 6,200 Twitter users.
They began by collecting posts from a list of news media accounts on Twitter, which had been classified as either trustworthy or deceptive in four categories: satire, propaganda, hoax, and clickbait.
They then used the Twitter public API to retrieve the most recent 3,200 tweets for each source, and filtered out any retweets to leave only original posts.
Next, they removed satirical sites such as The Onion that have humourous rather than deceptive purposes to produce a list of 251 trustworthy sources, such as the BBC and Reuters, and 159 unreliable sources, which included Infowars and Disclose.tv.
They then placed the roughly 6,200 Twitter users into two separate groups: those who have shared unreliable sources at least three times, and those who have only ever reposted stories from the trustworthy sites.
Finally, the researchers used the linguistic information in the tweets to train a series of models to forecast whether a user would likely spread disinformation.
Their most effective method used a neural model called T-BERT. The team says it can predict with 79.7% accuracy whether a user will repost unreliable sources in the future:
This demonstrates that neural models can automatically unveil (non-linear) relationships between a user’s generated textual content (i.e., language use) in the data and the prevalence of that user retweeting from reliable or unreliable news sources in the future
The team also performed a linguistic feature analysis to detect differences in language use between the two groups.
They found that users who shared unreliable sources were more likely to use words such as “liberal,” “government,” and “media,” and often referred to Islam or politics in the Middle East. In contrast, the users who shared trustworthy sources frequently tweeted about their social interactions and emotions, and often used words like “mood,” “wanna,” and “birthday.”
The researchers hope their findings will help social media giants combat disinformation.
“Studying and analyzing the behavior of users sharing content from unreliable news sources can help social media platforms to prevent the spread of fake news at the user level, complementing existing fact-checking methods that work on the post or the news source level,” said study co-author Yida Mu, a PhD student at the University of Sheffield.
You can read the full study in the journal PeerJ.
Get the TNW newsletter
Get the most important tech news in your inbox each week.