D-ID agentic video platform creates ‘new generation of digital humans’

The market for agentic AI services is as wide and broad as real life, or so it seems.

With the realms of language, voice and image all being fair game (and with taste and sensory perception, probably next) we learn this week that D-ID, a company that specialises in enterprise-grade AI avatar solutions, has announced the launch of its Agentic Videos service.

This new capability is promised to “transforms traditional video” into interactive, conversational experiences driven by real-time AI agents.

Digital humans

The launch builds on D-ID’s V4 Expressive Visual Agents, a technology designed to create a “new generation of digital humans” capable of low-latency, natural conversation. 

With Agentic Videos, D-ID extends these real-time capabilities into video itself, shifting content from a linear, one-way format into a more interactive, two-way experience, with knowledge, memory, a digital human interface, and agentic capabilities.

The company says that although video has become the content format of choice, it has remained one-directional in an increasingly interactive digital world. In answer then, Agentic Videos combines video with real-time, interactive AI.

Defeating attention deficits

Viewers can explore content more deeply, ask questions and personalise their experience, while creators gain insight into how audiences engage, where attention holds and what resonates.

According to Gil Perry, co-founder and CEO of D-ID, wach Agentic Video includes a visual AI agent (a human-like avatar that is able to conduct a natural conversation with the viewer) that understands the video’s script and context, enabling real-time interaction throughout the experience. 

Users can ask questions, request clarification, or dive deeper into specific topics via voice or chat. The agent is integrated as an additional layer within the video experience and remains available both during playback and after the video ends, so the experience continues beyond the video itself.

 “Video has always been a one-way medium,” said Perry. “With our V4 agents, we’ve brought digital humans to scale, enabling natural, real-time interaction. With Agentic Videos, we’re bringing that capability directly into content, so instead of just watching, you’re interacting. This opens up a new, much more effective way for organisations to train their employees, communicate with their clients, and market their products more efficiently.”

This shift may mark a change in how information is consumed if agentic workflows are the future of digital engagement. 

Bridging the awareness gap

Perry says that in advertising and product marketing, agentic videos “bridge the gap between awareness and conversion” by transforming a static commercial into a personalised consultation; potential customers no longer just watch a product demo – they can ask about specific features, compare pricing, or request a tailored use case in real-time. 

Similarly, for internal corporate applications like learning & development, these videos function as 24/7 subject matter experts. Instead of a passive onboarding session where information is easily forgotten, employees can probe for deeper context on company policy or technical training, ensuring higher knowledge retention and a more intuitive, self-paced learning journey. 

Real-time AI visual agents  

Agentic Videos are powered by real-time AI visual agents that understand both the content and the intent behind user queries. Responses are grounded in the original video script, with the ability to incorporate additional knowledge sources, ensuring answers remain accurate, contextual, and aligned with the creator’s message.

Built on D-ID’s V4 architecture, Agentic Videos offers sub-second response latency and advanced expression control, enabling interactions that feel immediate and natural.

The experience is fully integrated into the video player, allowing users to access the AI agent at any point via a dedicated interaction icon. At the end of the video, the agent appears automatically to continue the conversation, extending engagement without interrupting the viewing experience.