Personalized warnings could reduce hate speech on Twitter, researchers say

The 'more politely phrased' warnings were the most effective.

By Karissa Bell Nov. 22, 2021 6:00 am EST

NurPhoto via Getty Images

A set of carefully-worded warnings directed to the right accounts could help reduce the amount of hate on Twitter. That's the conclusion of new research examining whether targeted warnings could reduce hate speech on the platform.

Researchers at New York University's Center for Social Media and Politics found that personalized warnings alerting Twitter users to the consequences of their behavior reduced the number of tweets with hateful language a week after. While more study is needed, the experiment suggests that there is a "potential path forward for platforms seeking to reduce the use of hateful language by users," according to Mustafa Mikdat Yildirim, the lead author of the paper.

In the experiment, researchers identified accounts at risk of being suspended for breaking Twitter's rules against hate speech. They looked for people who had used at least one word contained in "hateful language dictionaries" over the previous week, who also followed at least one account that had recently been suspended after using such language.

From there, the researchers created test accounts with personas such as "hate speech warner," and used the accounts to tweet warnings at these individuals. They tested out several variations, but all had roughly the same message: that using hate speech put them at risk of being suspended, and that it had already happened to someone they follow.

"The user @account you follow was suspended, and I suspect this was because of hateful language," reads one sample message shared in the paper. "If you continue to use hate speech, you might get suspended temporarily." In another variation, the account doing the warning identified themselves as a professional researcher, while also letting the person know they were at risk of being suspended. "We tried to be as credible and convincing as possible," Yildirim tells Engadget.

The researchers found that the warnings were effective, at least in the short term. "Our results show that only one warning tweet sent by an account with no more than 100 followers can decrease the ratio of tweets with hateful language by up to 10%," the authors write. Interestingly, they found that messages that were "more politely phrased" led to even greater declines, with a decrease of up to 20 percent. "We tried to increase the politeness of our message by basically starting our warning by saying that 'oh, we respect your right to free speech, but on the other hand keep in mind that your hate speech might harm others,'" Yildirim says.

In the paper, Yildirim and his co-authors note that their test accounts only had around 100 followers each, and that they weren't associated with an authoritative entity. But if the same type of warnings were to come from Twitter itself, or an NGO or other organization, then the warnings may be even more useful. "The thing that we learned from this experiment is that the real mechanism at play could be the fact that we actually let these people know that there's some account, or some entity, that is watching and monitoring their behavior," Yildirim says. "The fact that their use of hate speech is seen by someone else could be the most important factor that led these people to decrease their hate speech."

Personalized warnings could reduce hate speech on Twitter, researchers say

Recommended