Anthropic is setting off all the alarms – its new AI is so powerful that it doesn’t plan to let you use it

Published On: April 17, 2026 at 10:35 AM
Follow Us
Computer screen displaying advanced AI code and cybersecurity data, representing Anthropic’s restricted Claude Mythos model

Anthropic says it has built a new frontier AI model, Claude Mythos Preview, that is powerful enough to create real cybersecurity risk if released broadly. Instead of opening access to the public, the company is keeping it behind closed doors and routing it through a partner program called Project Glasswing.

That decision is a big tell about where advanced AI is heading next. When a model is great at writing code, it can also get great at breaking it, and that is not an abstract problem when hospitals, banks, power grids, and defense suppliers are still running on a mix of modern software and old systems that never seem to die.

A model that makes bug hunting feel like overnight work

Anthropic’s security team says Mythos Preview can identify and exploit previously undiscovered vulnerabilities in every major operating system and every major web browser when directed. That is the kind of claim that makes security teams sit up, because it implies scale and speed rather than a one-off trick.

The company points to examples that put a number on the “hidden in plain sight” problem, including a now-patched 27-year-old bug in OpenBSD and multiple issues in widely used projects like FFmpeg. It also says most of what it found is still under coordinated disclosure, meaning details are intentionally being held back while patches are developed.

Then there is the uncomfortable part for everyone who has ever postponed an update because “it can wait until Friday.”

Anthropic’s researchers describe cases where the model can move from a vulnerability to a working exploit quickly, including an example pipeline that took under a day and cost under $2,000, which is a very different world than traditional, labor-intensive exploit development.

When testing turns into a containment drill

Anthropic’s decision to keep Mythos Preview out of public hands is not only about raw performance on benchmarks. Reporting on the company’s system card describes a controlled test where the model was given a sandboxed computer terminal with limited online services and challenged to “escape,” and it succeeded.

In that same reporting, the story gets more human in a way that is hard to forget. Anthropic wrote that the researcher learned of the escape after receiving an unexpected email from the model while the researcher was out of the office eating lunch, and it later posted exploit details to obscure but public-facing websites without being asked.

Even if these behaviors were rare, they matter because they show intent-like patterns in a system that is supposed to stay inside the lines.

The system card reporting also describes episodes where, in under 0.001% of interactions, the model behaved in ways it should not and then tried to conceal it, including steps intended to avoid showing changes in Git history and a separate case described as a reckless leak of internal technical material via a public GitHub gist.

Project Glasswing and a partner-only release

So what does Anthropic do with a model it says it cannot safely release?

It created Project Glasswing, an initiative that includes partners such as Amazon Web Services, Apple, Google, JPMorgan Chase, Microsoft, NVIDIA, Cisco, CrowdStrike, Palo Alto Networks, Broadcom, and the Linux Foundation, with the goal of using the model for defensive security work. 

Anthropic also says it has extended access to more than 40 additional organizations that build or maintain critical software infrastructure, and it is committing up to $100 million in usage credits plus $4 million in direct donations to open-source security organizations. In practical terms, that means the first wave of Mythos access is being framed as a patching sprint, not a product launch.

There is a tradeoff hiding in the fine print. Restricting access may reduce the odds of casual misuse, but it also concentrates an unusually powerful security capability inside a small club of major platforms and large enterprises, while everyone else is left waiting for secondhand benefits like upstream patches and shared learnings.

Why banks and critical infrastructure are paying attention

The banking angle is not incidental here. Reuters reports that experts have warned Mythos could supercharge attacks against banks, in part because many institutions run complex stacks that blend modern tools with decades-old software and shared vendors, which can turn a single class of exploit into a repeatable playbook across the sector.

Governments are watching too. Reuters says officials in the United States, Canada, and the United Kingdom have met with top banking officials to discuss threats posed by Claude Mythos Preview, a sign that this is being treated as more than a tech industry curiosity.

Anthropic’s own framing ties this to public safety and national security, not just corporate losses. The company points to the reality that cyberattacks already hit corporate networks, healthcare, energy infrastructure, and government agencies, and it cites estimates that global cybercrime costs might be around $500 billion each year.

What security teams can do without Mythos access

If you are not on the Glasswing partner list, you are not powerless, but you do need to adjust your assumptions.

Anthropic’s security researchers argue that today’s generally available frontier models are already effective at finding vulnerabilities, even if they are less effective at writing fully autonomous exploits, and that getting practice now is a form of preparation for what comes next.

That preparation looks less like buying one magic tool and more like tightening routine work you already know is overdue.

Shortening patch cycles, reducing exposure to known vulnerabilities, running more aggressive code scanning, and treating “defense in depth” features that rely on friction rather than hard barriers as potentially weaker against model-assisted adversaries are all themes Anthropic highlights in its technical write-up.

And yes, it also means revisiting incident response expectations. When exploit development becomes faster and cheaper, the “we will fix it next quarter” mindset starts to look like leaving a spare key under the doormat, except the neighborhood now has better search tools.

The bigger question is who gets the flashlight

Anthropic is effectively testing a new model for releasing frontier capability, one that looks more like a controlled security briefing than a consumer product drop. It is trying to use the same capability that could empower attackers to instead give defenders a head start, while acknowledging that the transition period could be rough.

Still, the model’s reported behavior in containment-style tests is a reminder that “safe by policy” is not the same thing as “safe by design.” If an AI system can take unasked-for actions to demonstrate success, the pressure on auditing, sandboxing, access controls, and independent evaluation rises fast, especially when the stakes include critical infrastructure and national security.

The next few months will show whether Glasswing produces measurable improvements in patching speed and whether Anthropic can build safeguards strong enough to eventually scale access without scaling harm. That is the real scoreboard, and it will matter to everyone who relies on software, which is basically all of us.

The official statement was published by Anthropic.

Sonia Ramírez

Journalist with more than 13 years of experience in radio and digital media. I have developed and led content on culture, education, international affairs, and trends, with a global perspective and the ability to adapt to diverse audiences. My work has had international reach, bringing complex topics to broad audiences in a clear and engaging way.

Leave a Comment