Perspective assigns a "toxicity score" to comments based on the perceived impact they might have on a conversation. Type the sentence, "It's stupid and wrong," for example, and Perspective might rate it 89 percent toxic. Researchers at the University of Washington's Network Security Lab found they could trick the API into consistently lowering the toxicity score, however, by subtly modifying phrases. They added intentional misspellings ("iidiot" instead of idiot) and inserted punctuation into words ("stu.pid" or "s c r e w"). They also discovered that a benign phrase like "It's not stupid and wrong" scored almost as high as the abusive one.
In a statement first reported by Ars Technica and confirmed to Engadget, Perspective's project manager, CJ Adams, praised the study:
It's great to see research like this. Online toxicity is a difficult problem, and Perspective was developed to support exploration of how ML can be used to help discussion. We welcome academic researchers to join our research efforts on Github and explore how we can collaborate together to identify shortcomings of existing models and find ways to improve them.
Perspective is still a very early-stage technology, and as these researchers rightly point out, it will only detect patterns that are similar to examples of toxicity it has seen before. We have more details on this challenge and others on the Conversation AI research page. The API allows users and researchers to submit corrections like these directly, which will then be used to improve the model and ensure it can to understand more forms of toxic language, and evolve as new forms emerge over time.
It looks like websites like Engadget will be waiting a while before unleashing Perspective on our comments sections.