Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

6 hours ago 10

The U.S. government on Friday ordered Anthropic to immediately shut off access to two of its most powerful AI models — Claude Fable 5 and Claude Mythos 5 — citing national security concerns. Anthropic announced on X that it has complied, but it made clear it thinks the government got this one wrong.

The directive, which Anthropic said it received on Friday at 5:21 pm ET, forces the company to disable both models for all users worldwide — not just the foreign nationals the government’s export control order was nominally aimed at. Access to Anthropic’s other models isn’t affected.

Why does any of this matter? Mythos is Anthropic’s most capable AI model, one the company previewed in early April and has kept tightly restricted ever since because of what Anthropic described as its exceptional ability to find security vulnerabilities in software. According to Anthropic, Mythos identified flaws in every major operating system and web browser it tested, so rather than release it broadly, the company launched a controlled program called Project Glasswing, sharing it with roughly 50 vetted organizations, including Amazon, Apple, Google, Microsoft, and CrowdStrike, to use for defensive cybersecurity work.

Fable 5, released just three days ago, was Anthropic’s answer to the obvious commercial pressure: a version of Mythos fitted with guardrails that block responses in high-risk areas like cybersecurity and biology, making it safe enough for general release, the company argued. It was immediately the most capable AI model available to the public, according to benchmark tests from Vals AI, a company that tracks AI tech performance.

The government’s directive is framed as an export control action, restricting foreign national access to the models. But in a lengthy blog post, Anthropic says its understanding is that the underlying concern is a claimed jailbreak of Fable 5. So far, the company says, the government has provided only verbal evidence of a “potential narrow, non-universal jailbreak” — one that, as Anthropic describes it, amounts to prompting the model to read a specific codebase and identify software flaws. And by the way, adds the company, it’s a “level of capability” that’s already widely available in other publicly accessible models, including OpenAI’s GPT-5.5. It’s also used routinely by cybersecurity professionals for defensive purposes, Anthropic observes.

Anthropic’s broader argument is that its strongest safeguards operate through independent classifier systems that function separately from the model itself, meaning that even if someone convinces Fable to keep talking past a refusal, the underlying protections against the most dangerous outputs remain in place. The company also notes in it post that a review of recent usage found no evidence of those safeguards being successfully bypassed to produce true, harmful content.

Clearly, none of that was enough to stop the government from acting, and Anthropic isn’t hiding its frustration. “We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people,” the company wrote. “If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.”

Anthropic is widely expected to pursue an IPO this year and has staked much of its public identity on being the safety-conscious alternative to its rivals. The irony isn’t lost on observers that the very caution Anthropic displayed in restricting Mythos — which it promoted as a model so dangerous it couldn’t be released publicly — has now apparently attracted exactly the kind of government scrutiny that could disrupt its business most.

OpenAI’s Sam Altman must be enjoying this, at least. In April, he told podcaster Ashlee Vance that Anthropic’s handling of Mythos amounted to “fear-based marketing.” “It is clearly incredible marketing to say, ‘We have built a bomb. We were about to drop it on your head. We will sell you a bomb shelter for $100 million,’” Altman said. Altman, whose company is also widely expected to pursue an IPO as soon as possible, didn’t predict a government shutdown, but he identified something that has come back to bite Anthropic for now, which is that when you spend months telling the world your AI is uniquely dangerous, the world — the U.S. government included — tends to listen.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Read Entire Article