The party is ON! Join us at TNW Conference 2021 in Amsterdam for face-to-face business!

Your sardonic source for consumer tech stories

This article was published on May 6, 2021

Twitter will now reprimand you for nasty replies

Twitter will now reprimand you for nasty replies
Ivan Mehta
Story by

Ivan Mehta

Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh." Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh."

If you’re planning to reply “FUCK YOU!” or something similar to a tweet, Twitter will make you think twice — literally. Starting today, the social network company is rolling out new prompts that aim you to stop from posting mean replies.

Twitter started this experiment last May with a limited set of users on iOS. Now it’s expanding to all users on Android and iOS.

The company said that this will cover potentially harmful or offensive replies — such as insults, strong language, or hateful remarks — in English for now. If the app’s algorithm detects such a reply, it’ll ask the user to reconsider sending it. You can delete the tweet or edit your response, but if you’re determined, you can still send the tweet with profanities.

Twitter’s warning prompt for replies with profanities

The firm admitted in the test last year, the algorithm failed to contextually separate a mean reply, sarcasm, and friendly banter. While the team has observed this behavior and made some changes to it, there’s a chance that the algorithm might get it wrong. In that case, you can tap on the “Did we get this wrong?” link to submit your feedback.

Twitter also considers if you and the person you’re sending your reply to interact frequently, to gauge if the reply is mean or just meant as a joke.

Submitting feedback if Twitter’s algorithm for mean reply detection gets is wrong

Twitter said that this method of prompting yielded encouraging results in its tests as 34% of people decided to alter or delete their replies.

That also means that 66% of people still decided to send it. Plus, there are ways to modify words and fool the algorithm into thinking that it’s a clean reply. And it doesn’t cover languages other than English, so if anyone’s multilingual, they can get away with abusive replies.

Despite all these hiccups, Twitter’s new feature is a positive step in reducing toxicity on the platform if it can bring down hateful comments by just a few notches.

Did you know we have a newsletter all about consumer tech? It’s called Plugged In – and you can subscribe to it right here.

Also tagged with