Researchers now know the words that make you sound credible (or phony) on Twitter

After scanning 66 million tweets linked to almost 1,400 real-world events, researchers at Georgia Institute of Technology now believe they can identify the words and phrases that lend credibility to Twitter posts about pertaining to specific events — even while they’re on-going.

According to Georgia Tech Ph.D. candidate and research lead Tanushree Mitra:

There have been many studies about social media credibility in recent years, but very little is known about what types of words or phrases create credibility perceptions during rapidly unfolding events.

Tweets with booster words, such as ‘undeniable,’ and positive emotion terms, such as ‘eager’ and ‘terrific,’ were viewed as highly credible,” Mitra said. Words indicating positive sentiment but mocking the impracticality of the event, such as ‘ha,’ ‘grins’ or ‘joking,’ were seen as less credible. So were hedge words, including ‘certain level’ and ‘suspects.’

The team looked at tweets for events in 2014 and 2015, including the news of Ebola in West Africa, the Charlie Hebdo attacks in Paris, and the death of Eric Garner in New York City. They then asked people to judge the credibility of Twitter posts based on a credibility range from “certainly accurate” to “certainly inaccurate” before modeling the results and splitting them into 15 distinct linguistic categories ranging from positive and negative emotions, hedges and boosters, and anxiety.

After being fed into a computer, the machine matched human opinion about 68 percent of the time — significantly higher than the random baseline of 25 percent.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

It also found some surprising correlation. For example, messages with a higher number of retweets were deemed less credible while replies and retweets with longer message lengths were found to be more credible.

“It could be that longer message lengths provide more information or reasoning, so they’re viewed as more trustworthy,” Mitra said. “On the other hand, a higher number of retweets, which was scored lower on credibility, might represent an attempt to elicit collective reasoning during times of crisis or uncertainty.”

The system isn’t perfect, but when paired with other signals, the linguistic model could prove to be a worthy adversary to one day fighting the spread of fake news.

The paper, “A Parsimonious Language Model of Social Media Credibility Across Disparate Events,” will be presented in February at the ACM Conference on Computer-Supported Cooperative Work and Social Computing.