Undo CTO: How to unlock the potential of AI coding agents by giving them runtime context

This is a guest post for the Computer Weekly Developer Network written by Mark Williamson, CTO of Undo – a company known for its Time Travel Debugging software that lets developers record, “blink back” and then replay live program execution. 

Engineering leaders know that most applications contain problems or bugs that make them unstable. It’s unfortunate, but bugs are just part of the fabric of the software that powers our modern world… so how should we move forward?

Williamson eats into the subject and writes in full as follows…

Test suites flag a multitude of issues every time they run. Some of them require simple fixes that can be picked out quickly so the engineer can solve them and move on. But a lot of tickets are never closed, because it’s impossible to reproduce the precise circumstances that caused the bug to emerge.

To save wasting countless cycles chasing phantom problems that may never happen again, teams get by with mitigations and workarounds that keep bugs contained and stop them from doing real damage. And that usually works fine.

Until it doesn’t.

Not exactly nine to five

Engineers are no strangers to seeing latent issues surfacing as failures serious enough to pull the whole team into firefighting mode at a moment’s notice. Being called into a crisis meeting at all hours of the day to troubleshoot and root cause a problem has long been a fact of life for developers.

The fact is that every defect left unresolved is a potential customer escalation in waiting. That usually comes at the worst possible time, such as when there is exceptionally high demand from users, or in the middle of the night after a botched update.

It’s invariably a long, drawn-out process to find and fix the cause of the problem, creating a headache for developers. Very rare are the occasions when they can quickly one-shot the problem and get back to their day (or bed) with minimal disruption.

Adding fuel to the fire

Today, the rise of AI coding assistants has amplified the risk of latent bugs more than ever.

There’s understandably huge excitement about the impact the likes of Cursor, Codex, and Claude Code are having on developer output. These tools have slashed the time it takes developers to write (or generate) code. But as with all things in life, there’s a catch.

Writing the code was never the bottleneck. Most of a developer’s time is spent understanding code – whether their own, or something created by somebody (or now something) else. So much effort goes into figuring out what code does, identifying where it’s not working as it should and then debugging the cause of those unexpected results.

It came as a surprise to nobody that the ability to use AI to generate more code faster than ever did nothing to reduce that wider effort. In fact, it’s done the opposite – with engineers spending even more time in code comprehension and debugging.

Plausible doesn’t mean correct

A large contributor to that added work is that AI coding agents are fundamentally plausibility engines. It’s no secret that they take guesses to fill gaps in their knowledge where they don’t have a complete picture to begin with, based on what seems likely to come next.

They therefore often produce code that looks accurate enough to pass the test suite, but inexplicably fails in production when a very specific scenario arises. That seems inevitable when you consider that it’s impossible for anyone – human or AI – to account for every possible scenario when looking at source code alone.

For the engineers left to unpick what the code actually did when a test fails or a P1 incident arises, things can quickly spiral into a nightmare of coffee-fuelled late nights, with no end in sight.

An incomplete picture into a whole

Williamson: The only way to solve [debugging] is by capturing a complete record of everything a program does on execution.

It can take weeks, months, or even longer to go back over a failed run, understand where the code went wrong, and determine how to prevent it from happening again.

That’s made more difficult by the fact that most engineers rely on logs or the source code itself to decipher program execution. These only offer a partial view of what happened at runtime. As a result, it’s not always possible to find the cause of a bug without recreating the one-in-a-million chance conditions that led to it – which engineers often don’t have visibility of.

The only way to solve that is by capturing a complete record of everything a program does on execution. By giving that record to their AI agents, engineers can task them with looking back through every step to identify the exact moment the program failed, and reason about why with precision.

From guessing to knowing

Armed with the precise context of what happened at runtime, AI agents can automatically pinpoint the root cause of a failure based on hard evidence, not hallucinations or guesses.

As a result, engineers can realise the full impact of AI coding tools. By eliminating the long and painstaking search for clues and answers when their tests fail, engineers can resolve bugs quickly and close tickets that would otherwise hang around long past their shelf date.

That frees them to focus on building the next generation of software systems that keep the world turning – which was of course the goal all along.