Twitter tests new harassment prevention feature with ‘Safety Mode’

The feature will automatically block trolls for seven days.


Twitter is experimenting with its most aggressive anti-harassment features to date. The company will start testing “Safety Mode” a new account setting that automatically blocks accounts using “potentially harmful language.” Twitter first previewed the feature back in February during its Analyst Day presentation, but is now starting to make it available to “a small feedback group.” It’s not clear when it might be available more widely.

When enabled, Safety Mode will proactively block accounts that are likely to be the source of harassment for a period of seven days. Twitter says the system is designed so that accounts of people you know or frequently interact with won’t be blocked, but trolls will.

“Safety Mode is a feature that temporarily blocks accounts for seven days for using potentially harmful language — such as insults or hateful remarks — or sending repetitive and uninvited replies or mentions,” Twitter writes in a blog post. “When the feature is turned on in your Settings, our systems will assess the likelihood of a negative engagement by considering both the Tweet’s content and the relationship between the Tweet author and replier.”

While Twitter has taken several steps in the past to address its long-running harassment problem, Safety Mode is notable because it takes more of the burden off of the person being harassed. Instead of manually blocking, muting and reporting problematic accounts, the feature should be able to catch the offending tweets before they are seen.

Because it’s still in a relatively early stage, Twitter says it’s likely to make at least some mistakes. And users are able to manually review the tweets and accounts flagged by Safety Mode, and reverse faulty autoblocks. When the seven-day period ends, users will get a notification "recapping" the actions Safety Mode took.