Twitter will now reprimand you for nasty replies

If you’re planning to reply “FUCK YOU!” or something similar to a tweet, Twitter will make you think twice — literally. Starting today, the social network company is rolling out new prompts that aim you to stop from posting mean replies.

Twitter started this experiment last May with a limited set of users on iOS. Now it’s expanding to all users on Android and iOS.

The company said that this will cover potentially harmful or offensive replies — such as insults, strong language, or hateful remarks — in English for now. If the app’s algorithm detects such a reply, it’ll ask the user to reconsider sending it. You can delete the tweet or edit your response, but if you’re determined, you can still send the tweet with profanities.

Twitter’s warning prompt for replies with profanities

The firm admitted in the test last year, the algorithm failed to contextually separate a mean reply, sarcasm, and friendly banter. While the team has observed this behavior and made some changes to it, there’s a chance that the algorithm might get it wrong. In that case, you can tap on the “Did we get this wrong?” link to submit your feedback.

TNW Conference 2024- 2for1 offer this week only!

Don't miss out on the world-class speakers. Secure your 2for1 tickets before 23 April.

Twitter also considers if you and the person you’re sending your reply to interact frequently, to gauge if the reply is mean or just meant as a joke.

Submitting feedback if Twitter’s algorithm for mean reply detection gets is wrong

Twitter said that this method of prompting yielded encouraging results in its tests as 34% of people decided to alter or delete their replies.

That also means that 66% of people still decided to send it. Plus, there are ways to modify words and fool the algorithm into thinking that it’s a clean reply. And it doesn’t cover languages other than English, so if anyone’s multilingual, they can get away with abusive replies.

Despite all these hiccups, Twitter’s new feature is a positive step in reducing toxicity on the platform if it can bring down hateful comments by just a few notches.

Story by Ivan Mehta

Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh." Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh."

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Twitter will now reprimand you for nasty replies

Get the TNW newsletter

Also tagged with

Twitter’s withdrawal from disinformation code draws ire of EU politicians

Twitter/X is biggest source of social media disinformation, EU warns

Join TNW All Access

Musk claims moderation stifles free speech on Twitter. He’s wrong

Elon Musk’s pitch to investors: 69 million Twitter Blue users by 2025