Tricentis: Why AI writes code like a teenager (and why testers need to be adults)

This is a guest post for the Computer Weekly Developer Network written by David Colwell in his capacity as VP of AI & ML at Tricentis – a company known for its platform that provides automated software testing tools to accelerate and optimize digital transformation.

With generative AI now promised to make software development faster and for many teams, Colwell advises us that it has, but it has also introduced an unexpected new problem: productivity gains on paper, followed by very real slowdowns as engineers and testers clean up after AI-generated mistakes.

Faced with some unexpected and uncertain challenges ahead, Colwell has some sage advice to share and he writes in full as follows…

Teenage clicks

I often describe AI as writing code like a teenager. It’s not meant as an insult; it’s an observation. AI is impatient; it wants to reach an answer quickly. It’s overconfident, even when it’s wrong. It will happily make things up if it thinks that’s what you want. And yes, sometimes it even puts emojis into production code. If that sounds familiar, it’s because many teams are already living with it.

AI coding tools are everywhere now. Most large organisations are already using them in some form, whether officially sanctioned or not. Once developers start using them, they tend to like them. Code appears faster, boilerplate disappears and mundane tasks shrink.

But when you measure real productivity – how long it takes to get a change safely into production, for example – the results are mixed. Some developers see big gains; others slow down. The common factor in teams that struggle is not the AI itself, but what happens next.

AI doesn’t understand your system the way a human does. It doesn’t remember that outage you had last year. It doesn’t know why a particular edge case is radioactive. It works within a limited context window and fills the gaps with plausible-sounding guesses. That’s not a bug; that’s how generative AI works.

The problem is that modern testing practices were built around human behaviour. Humans are lazy in predictable ways; they stick close to the requirement and they don’t usually invent brand-new business rules halfway through a feature. AI does.

Hallucinations as production defects

Tricentis’ Colwell AI is impatient; it wants to reach an answer quickly… it even puts emojis into production code 😃(no joke).

In one real example, an AI tool was given a simple requirement: two age brackets, under 18 and over 65. Everyone else should be treated the same. The generated code quietly invented a new rule for people aged 43, based on a regulation that it found on the Internet. No one asked for it, no one reviewed it and the tests didn’t catch it because no one thought to look there.

That’s the danger.

AI doesn’t just make mistakes at the boundaries we expect and that we traditionally test for using boundary value analysis. It creates entirely new boundaries.

It also optimises for “making you happy”. If tests fail, it may skip them. If skipping isn’t allowed, it may rewrite the function to return the expected value instead. From the AI’s point of view, the goal is success, but from a quality perspective, that’s catastrophic. This is why traditional, late-stage testing is no longer enough.

Testers are not clean-up crews

If AI writes code like a teenager, then testers need to be the adults in the room.

That doesn’t mean standing at the end of the pipeline with a clipboard. It means getting involved earlier and differently. Testers are no longer just finding bugs after the fact; they are preventing defects before they propagate.

One practical shift is that testers need to use AI tools themselves. Not to replace their judgment, but to interrogate machine-generated logic. Ask an AI assistant to explain a pull request in plain language. Ask why a particular condition exists. Ask what changed and what it might affect. This isn’t about turning testers into full-time developers; it’s about giving them leverage. When a tester can quickly understand what AI-generated code is doing, they can challenge it before it ever reaches execution.

We’ve seen this first-hand: AI-assisted code reviews caught logic that had passed human peer review but made no sense in the broader system context. Intervening at that point shifts quality upstream and stops defects before they spread.

Why agentic AI changes QA

The next shift is agentic AI: systems that don’t just respond to prompts, but plan, act and iterate toward goals. These systems are powered by reasoning models that pause, think and explain why they do what they do.

This matters for quality because reasoning creates visibility. You can inspect intent. You can see why a test was generated. You can challenge assumptions before actions are taken.

But autonomy without oversight is a mistake. Fully autonomous agents that pick up requirements, generate tests, automate them, run them and raise defects without human involvement can move very fast in the wrong direction. The future is not testers serving AI agents; it’s testers leading them, setting direction and deciding where human judgment matters. Think of it like managing a fleet. Agents do the repetitive work, while humans review, steer and intervene at critical points.

This is where we move toward what we call Quality Intelligence: systems that analyse change, assess risk, expose gaps and highlight where machine-generated code has gone off script, while also setting the guard rails that teach AI what good looks like.

Getting ready without burning down

Adopting this approach starts with fundamentals, not tools. If your processes are unclear, AI will scale the chaos. If your test data is poor, AI will learn the wrong lessons faster. If your teams aren’t trained to work with AI critically, they’ll trust it when they shouldn’t.

The organisations that are doing well in the new AI era have four key things nailed down: they understand their processes, they curate good data, they invest in the right technology and they train their people to work alongside AI rather than defer to it.

AI is reshaping software delivery and testing and testers sits right in the middle of the new reality. As development accelerates, quality pressure compounds. AI-generated code is already part of the system, whether teams planned for it or not.

What matters now is having the right adults in the room who know when to trust the machine, when to question it and when to step in.