Maksim Kabakou - Fotolia

What frontier AI actually means for enterprise security

The Computer Weekly Security Think Tank considers if Anthropic’s Claude Mythos frontier AI model is a benefit or barrier to achieving resilient enterprise IT security, and how security leaders need to adapt.

The "clear and present danger or hot air" framing surrounding Anthropic's Claude Mythos frontier AI model sets up the wrong argument. What Mythos represents is a tempo shift in how vulnerabilities are found, chained, and exploited.

Anthropic says Mythos can identify and exploit zero-days across major operating systems and browsers at a level serious enough that they chose not to release it publicly, limiting access through Project Glasswing to a select group of organisations. The UK AI Security Institute (AISI) puts numbers on that. Before Mythos, no AI model had completed a 32-step simulated corporate attack chain end-to-end. Mythos Preview did so in three out of ten runs. GPT-5.5 did so in two. At Expert level, GPT-5.5 achieved a 71.4% pass rate against Mythos at 68.6%. The capability gap between the two leading frontier models is narrower than the coverage implies. The governance gap is considerably wider.

Among practitioners who've tested Mythos, the reaction has been more restrained than the policy response suggests, and they’re largely right. The vulnerability pipeline was never the core problem for defenders. We were never suffering from a shortage of things to worry about. We already have more disclosures, more advisories, more proof-of-concepts, and more exposure data than most organisations can realistically operationalise.

What Mythos does is accelerate that reality. It compresses the timeline between weakness, discovery, weaponisation, and the need for defensive action. The barriers to deploying a model like this are real today. The compute requirements are substantial and the infrastructure demands are specialised. Those barriers won't remain in place for long.

There's also a surface area problem running underneath the discovery question. Vibe coding guarantees we don't hit a plateau. Higher tempo development, more dependencies, more confident shipping. Even if defect rate per commit improves, total attack surface area grows. The volume of code being written with AI assistance means the target is expanding at the same time as the tools for finding flaws in it are improving.

Current models are genuinely capable at pattern bugs: injection flaws, leaked secrets, known bad dependencies, and chaining findings across systems. Where they still fall short is anywhere correctness depends on intent. Business-logic and authorization flaws remain the category where AI models are consistently weakest. Unlike pattern bugs, they require understanding what code is supposed to do, not just what it does. That gap hasn't yet been closed. Human judgment remains irreplaceable in the research and security pipeline. At Vedere Labs we already use Claude Opus 4.6 in our research workflow and have reported several zero-days found through that process. The goal is turning faster research into better protection.

From a vendor perspective, vulnerability management and QA are converging as AI tooling improves, but the question each answers remains distinct and no amount of converging changes that. QA asks whether something works. Vulnerability management asks whether it can be abused and what the blast radius looks like.

When models like Mythos become more widely available, expect a spike in disclosed vulnerabilities, then a reckoning: vendors confronting what was already there but never measured. The question is whether discovery translates into remediation or just accumulates as a bigger backlog. The pressure to ship doesn't disappear because a model found more bugs. Without hard blocks for exploitable, high-impact issues and firm deadlines for everything else, the surge risks becoming the new normal rather than a more durable correction.

For defenders, the hard part has always been what to do with the intelligence. Where is the affected asset, is it actually exposed, how critical is it, what’s the likely path to compromise, and what can be done right now to reduce risk? Mythos makes that operational burden more urgent. The NCSC said as much when it warned of a coming vulnerability patch wave, and the AISI benchmark data gives that warning some weight.

The faster vulnerabilities are found, the more fragile any organisation becomes if its remediation process can't keep pace. This is especially acute in operational technology (OT) and critical national infrastructure (CNI), where the systems most essential to societal function are often the least capable of aggressive patch velocity without introducing operational instability. In those environments, patching at scale can itself become a source of risk.

The focus has to shift toward operational survivability: preserving visibility, constraining attacker manoeuvre space, limiting blast radius, and maintaining continuity under stress. The organisations that can patch at pace without the wheels falling off will be the ones that have already done the foundational work on asset inventory, segmentation, and prioritisation based on actual exposure.

In the frontier AI era, operational survivability is the measure that matters. The organisations that understand that now won't be the ones scrambling up the beaches when the patch wave starts to build.

Read more on Application security and coding requirements