Anthropic backtracks on policy that 'sabotaged' researchers' work

It wasn't a good look for a company that prides itself on working closely with the academic community.

Anthropic is walking back a policy that discreetly hamstrung researchers using its new Claude Fable 5 LLM to create competing AI models, the company told Wired. "We're changing Fable 5's safeguards for frontier LLM development to make them visible," the company said in a statement. "We made the wrong tradeoff and we apologize for not getting the balance right." 

When Anthropic released Claude Fable 5, a new model based on its powerful Mythos system, researchers noted something odd. They found that that Fable 5 would quietly reroute requests to a lesser model when asked to perform certain actions. Moreover, that restriction wasn't disclosed in the model's documentation. 

The new model was either refusing or degrading responses for tasks like training competing LLMs, debugging AI code and optimizing neural architecture. Researchers were bothered not only by that degradation but by Anthropic's lack of transparency about it. They were also concerned, of course, that they had burned tokens and money for a model that didn't do what they expected. 

Anthropic has painted itself as a more ethical and researcher-friendly alternative to OpenAI, so its actions with Fable 5 created a swift backlash. "Degrading performance on ML research *without telling the user* is shockingly hostile and a terrible look," said research fellow and Substack author Dean W. Ball on X.

Anthropic isn't reversing its safeguard policy on Fable 5, but rather making the restrictions visible to users. "If the company suspects a user is trying to use Claude to build a highly capable AI it will alert them that it's either refusing the request, or rerouting the user to a less capable model," Wired wrote. 

Recommended