Guardian agents: Stopping AI from going rogue

AI systems don't share our values and can easily go rogue. But instead of trying to make AI more human, we need a new class of guardian agents to act as digital sentinels, monitoring our autonomous systems before we lose control completely

Daryl Plummer

Published: 15 Aug 2025

In the artificial intelligence (AI) race, businesses are handing over critical decisions to AI systems that do not, and will not, think like humans. These systems don’t understand ethics or right from wrong. They’re only focused on the end goal.

Humans, by contrast, instinctively evaluate decisions through accountability, knowledge and shared norms. When AI breaks these expectations, the reflex is to make it act “more human.” But imposing logic and values innate only to humans is exactly when AI can go rogue – and this is where the real danger lies.

Last month, basic security flaws in McDonald’s AI-powered hiring assistant, Olivia, left the personal data of millions of job applicants globally exposed online. There was no hacker involved. Just AI doing what it was meant to do.

Over a decade earlier, the US markets experienced the now infamous Flash Crash, when autonomous trading agents designed to respond to market conditions began reacting to each other’s moves in a rapid feedback loop. This wiped out nearly 1,000 points on the stock market within minutes.

In both cases, AI systems operated exactly as designed – until they didn’t.

Today, that risk is accelerating. AI is being deployed faster, deeper and across more core business functions than ever before. While no one fully trusts AI, most companies are relying on humans in the loop to stay in control. That may work in isolated use cases but doesn’t scale in practice. The simple truth is there aren’t enough humans to oversee everything AI is doing.

Gartner predicts that by 2027, 80% of companies lacking AI risk mitigation will face catastrophic outcomes, litigation, leadership, reputational blacklisting and permanent brand damage. To avoid this happening, a different type of AI is needed to monitor behaviour, make decisions and step in when something goes rogue. This is where guardian agents come in.

What are guardian agents?

Think of them like sentinels. AI systems designed to watch over other AI to ensure trustworthy, secure interactions between autonomous systems and the real world. They’re agents first and foremost – autonomous or semi-autonomous systems that act on our behalf.

What sets guardian agents apart from traditional AI tools like ChatGPT is their focus on oversight and control. They function as both AI assistants (supporting users with tasks like content review, monitoring and analysis) and semi or fully autonomous agents (formulating and executing action plans, while redirecting or blocking actions against predefined goals).

As an example, guardian agents are being used to review the output of language translations generated by AI to check for accuracy and context before reaching the end-user. In this case, the guardian agent is acting as a protector, reviewing generated content and applying guardrails before the output is released.

While the benefits are promising, guardian agents aren’t a silver bullet when it comes to safeguarding AI interactions. They do play a vital role, but only as part of a broader, layered approach to trust and risk management.

Guardian agents are still an emerging concept; however, their role is becoming increasingly clear. Gartner believes guardian agent technologies will account for at least 10 to 15% of agentic AI markets by 2030.

How to get started

Start by building an understanding of agentic AI and how AI agents are being deployed throughout the organisation. Agentic AI introduces the ability to create autonomous solutions that use large language models (LLMs) to execute and drive processes.

To take advantage of this shift – and manage the risks – it’s important to understand emerging patterns and how they shape the way agentic systems are delivered, and how robust and reliable they are.

The next step is to begin experimenting with agentic platforms. Most major AI vendors are releasing platforms that support multiple models and modes of generative AI. This helps organisations fine tune models and optimise prompts, while providing the building blocks for deploying guardian agents.

The point isn’t to get everything right immediately. It’s to start learning by doing.

Finally, understand the workflows that AI agents follow to get things done, which will revert into the world of process management. This means assessing how data moves within the organisation, what access rights apply, which rules and policies are enforced and what events trigger actions. These are the areas guardian agents will monitor to detect when something goes rogue.

Was there an event that wasn’t handled? Was there an API (application programming interface) that shouldn’t have been there? Was there a log file that violated certain rules? These signals will be the entry point for guardian agents.

Ultimately, an agentic system isn’t just a tool. It’s an active, autonomous environment that is trying to reach a goal. But those goals must be achieved on your terms.

So next time an AI takes action, determine who’s in control. If no human or guardian agent is watching, they’re not in control – the AI already is.

Daryl Plummer is a distinguished vice-president analyst and fellow at Gartner

Guardian agents: Stopping AI from going rogue

AI systems don't share our values and can easily go rogue. But instead of trying to make AI more human, we need a new class of guardian agents to act as digital sentinels, monitoring our autonomous systems before we lose control completely

What are guardian agents?

How to get started

Read more on Artificial intelligence, automation and robotics

Unlocking the value of multi-agent systems in 2026

The three cyber trends that will define 2026

News brief: Agentic AI disrupts security, for better or worse

Agentic process automation: The enterprise guide to autonomous AI workflows