sezer66 - stock.adobe.com

In video games of the future, your AI teammates will actually listen

Ubisoft executives offer a glimpse into the engineering behind its generative AI middleware, including the use of small language models, prompt optimisation and on-device processing to bring virtual teammates to life

For a while now, gamers have shouted at their screens, barking orders or venting frustrations at virtual squadmates who could not hear them. In the rare games that did incorporate voice commands, players were forced to memorise rigid menus containing specific phrases.

But at the Nvidia GTC 2026 developer conference earlier this year, French video game giant Ubisoft offered a glimpse into a future where onscreen characters can understand what you are saying – and talk back – through Teammates, an experimental prototype that replaces traditional, pre-programmed non-playable characters (NPCs) with squadmates powered by generative artificial intelligence (GenAI)

Expanding on Ubisoft’s 2024 Neo NPC project, which was honoured under the France 2030 programme for advancing French innovation, Teammates places players in a first-person shooter alongside virtual soldiers who react to natural language, environmental context and the player’s personal slang.

Tell your virtual teammate, “Find cover behind that car and wait for my order to shoot the closest enemy”, and the character will parse the command, evaluate its surroundings, and execute the manoeuvre while acknowledging the strategy.

According to Ubisoft, achieving this level of immersion required more than just powerful large language models (LLMs). The project team had to rethink the inference pipeline to abstract complexity and optimise for latency.

Abstracting complexity

Behind the scenes, the goal is not just to build an AI-powered game but to build a foundation that thousands of artists, writers and designers can use without needing a background in AI and machine learning.

“Most game development teams don’t have all the specialised skills required to update complex GenAI systems,” Joel Gregoire, technical director at Ubisoft Paris, explained during the GTC presentation. “The answer to that question was to build a platform to abstract the complexity and make games with GenAI features.”

Ubisoft’s solution functions as an agnostic middleware. Built around a C++ software development kit, the platform creates gameplay building blocks, such as NPC interactions, which are dynamically translated into prompts. Through custom engine plugins, this data feeds directly into Ubisoft’s proprietary Snowdrop and Anvil engines, translating raw language model outputs into engine-specific formats like facial animation data.

“Think of it as an agnostic middleware for GenAI that we can easily plug into our in-house game engines,” said Xavier Manzanares, director of gameplay GenAI at Ubisoft. “It opens a whole lot of new opportunities for our teams.”

The problem of the awkward pause

If the promise of conversational AI in gaming is exciting, the engineering required to make it convincing is formidably complex. For all their linguistic fluency, LLMs are computationally heavy and notoriously slow.

Most game development teams don’t have all the specialised skills required to update complex GenAI systems. The answer was to build a platform to abstract the complexity and make games with GenAI features
Joel Gregoire, Ubisoft Paris

In a normal conversation, a human responds in a fraction of a second. When Ubisoft began testing its early generative models, the characters took more than three seconds to process a player’s speech, decide on an action, generate a response, and synthesise the audio.

“Creativity starts with quality,” Maxime Sazadaly, the technical lead machine learning engineer at Ubisoft Paris, told the audience at Nvidia GTC. “But in fact, there’s something almost as important as quality, and that’s latency.”

A three-second delay in the middle of a virtual firefight could leave the player staring at a blank, unresponsive avatar. “Even if the action is the correct one, you won’t have a perception of intelligence just because it takes so long,” Sazadaly noted.

To make the characters feel alive, Ubisoft’s engineers determined that the entire loop – from a player speaking into a microphone to a character reacting – had to occur in under two seconds, which the team set out to achieve in three ways:

  1. Using faster base models: The team switched from slower models to more efficient ones, employing Nvidia’s Parakeet-tdt-v3 for automatic speech recognition (ASR), Gemini 2.5 Flash Lite for cloud LLM inference, and ElevenLabs Flash v2 for text-to-speech (TTS).
  1. Streaming everywhere: Instead of waiting for an entire response to generate, Ubisoft implemented partial function parsing. The moment the LLM outputs its first actionable function, the data is pushed to the game’s behaviour tree so the NPC can start moving. Audio is similarly streamed and stitched together chunk by chunk.
  1. Prompt factorisation: By identifying redundant aliases and data in their massive 10,000+ token perception prompts, the team reduced prompt sizes by 30%, significantly lowering the time-to-first-token (TTFT).

As a result of these optimisations, the team reduced the response time to just 1.5 seconds.

Gregoire’s team also established an application programming interface (API) gateway that lets developers perform inference in the cloud, accessing third-party models or Ubisoft-hosted models via Kubernetes and Nvidia graphics processing unit (GPU) operators – or entirely on-device to enable offline play and lower operational costs.

Using Nvidia In-Game Inferencing and Cuda-in-Graphics integrations, the team successfully deployed Teammates locally on high-end consumer GPUs: the Nvidia RTX 4090 and RTX 5090. To stay within a typical AAA game’s rendering budget, they also used highly optimised small language models (SLMs), specifically the four-billion-parameter Qwen3-4B-Instruct-2507 model quantised to INT4 for speed and FP8 for quality, and the KaniTTS-400m model for local voice generation.

“Current high-end hardware and optimised inference stacks now allow multi-model GenAI pipelines to run alongside game workloads,” said Gregoire. “Moving the inference on-device is the next logical step to make NPC interaction scalable.”

The prototype also includes the Jaspar AI personal assistant that helps players navigate game menus, adjusts the interface for accessibility needs, like colour blindness, and offers tactical advice. Furthermore, the game’s AI constantly analyses the player’s behaviour, awarding dynamic achievements based on their playing style and providing a personalised debriefing at the end of each mission.

Whether Ubisoft’s project will progress further to entirely replace the hand-crafted, cinematic moments that have defined blockbuster games remains to be seen. For now, the industry’s largest players view AI as a tool to bolster the gaming experience.

“Creativity remains deeply human,” said Yves Guillemot, the co-founder and chief executive of Ubisoft. “AI provides tools that help bring creative visions to life in new ways; it can be a powerful enabler to create even more meaningful and immersive experiences for players.”

Read more about AI in APAC

Read more on Artificial intelligence, automation and robotics