LLM series - Qt: Don’t automatically 'go large' for code generation; small is beautiful

This is a guest post for the Computer Weekly Developer Network written by Peter Schneider in his capacity as senior product manager at Qt Group.

As a company, Qt (pronounced ‘cute’) is known for its platform which it says provides all the tools a software application development engineer needs for creating software applications or embedded devices, from planning and design to development, testing and future-proofing.

Qt Group is headquartered in Espoo, Finland, and employs almost 700 people globally.

Schneider writes in full as follows…

I don’t personally believe ‘large’ language models (LLMs) are the only way of succeeding in AI-assisted code generation.

There seems to be gathering momentum for ‘medium-sized’ models that actually specialise in coding, a movement I believe will only continue, with more service providers entering the scene to provide hyper-industry focused LLMs.

Developers would do well to investigate these for code generation purposes.

Small is beautiful

A good example of this is models like StarCoder, whose fine-tuning of the model outperforms the bigger and more mainstream LLMs.

That’s because it’s tailor-made for the use case it was designed for – coding. More and more companies in the software and IT space are looking to develop their own LLM as we speak. Are these all as ‘good’ as what OpenAI has created? They might not all be, but they are at least safe. You don’t need LLMs filled to the brim with irrelevant knowledge like, “Who was the third president of the United States?” The smaller the data pool is, the easier it is to keep things relevant and the cheaper the model is to train, too.

Prompt engineering is obviously another means of fine-tuning models, but it’s not the only means of doing so.

It’s actually a very tedious and time-consuming process, and certainly, I don’t think every company needs its own dedicated prompt engineer.

LLM apprehension

Qt Group’s Peter Schneider.

For serious and sensitive commercial enterprise coding, the broad-purpose LLMs should be treated with apprehension because there are just too many unknowns about where the code originates from. If even just one percent of the code is dubious, especially if you have no means to update the software over the air, you’ll have to recall the product.

You don’t want to do that.

Developers and enterprises should of course demand more transparency from LLM providers. This is already happening, in fact, because in our own conversations with Qt customers, we’re hearing more and more people express they’re not comfortable with creating their product using closed-source GenAI assistants. So, I do think developers and enterprises will gradually become even more sceptical of code generation than they are now. With enough pressure, the LLMs will eventually be forced to shift towards greater transparency.

Mainstream LLMs like ChatGPT and GitHub Copilot are still fantastic for aspiring developers using them as a learning mechanism. But treat every line of code like it’s your own.

Do the peer reviews and ask your colleague, “Is this good code or is it bad code?”

Don’t just blindly trust it.