Speaking in volumes: UiPath talks up Google Gemini models for voice agents

UiPath used its annual developer/practitioner event this week in Las Vegas to detail the launch of the UiPath Conversational Agent.

The technology’s voice interaction is enabled by Google’s Gemini models.

Users can now build agentic automation into business processes and seamlessly without the need for complex coding and manual effort.

Why are voice agents useful?

Text-based interactions with AI agents work well for tasks that demand precision, such as analysing complex datasets, refining documents, or drafting formal communications – but can be challenging when applied to spontaneous conversations or real-world, dynamic interactions. 

“Voice interactions provide contextual cues, subtle communication nuances and collaborative problem-solving capabilities that text-based exchanges cannot deliver. AI agents can [use] voice communication to enhance their effectiveness in handling unpredictable and open-ended tasks. As AI agents become more prevalent, voice interaction will emerge as a natural communication method,” notes UiPath.

Enterprise-grade developer execution 

Using Google Cloud’s Vertex AI platform, UiPath users can enables teams to trigger, build and manage automation through natural language speech – with the same contextual awareness and enterprise-grade execution as a developer writing code or a process owner building automated workflows.

Feeling emotion-aware dialogues 

It also enables advanced features like affective (emotion-aware) dialogue and proactive audio (where the model can decide to ignore or respond to certain inputs), elevating the capabilities of AI agents to interact with participants in natural voice.

“Voice is the most natural way we communicate, and now it can be the most natural way to automate,” said Graham Sheldon, Chief product officer at UiPath. “By bringing Google Cloud’s Vertex AI and Gemini models into the UiPath Platform, customers can trigger and orchestrate automations through real-time speech – making agentic AI more intuitive, more accessible and more impactful in the flow of everyday work.”

The conversational agent has high automatic speech recognition (ASR) accuracy rates, multilingual support, reliable function calling for appropriate tool selection and low latencies for real-time processing.

“The first wave of generative AI focused on individual productivity; the next is about transforming core business processes,” said Michael Gerstenhaber, VP of product management for Vertex AI, Google Cloud. “On Google Cloud’s Vertex AI, partners like UiPath are at the forefront of this shift, using our Gemini models to build agents that translate human language directly into complex, automated workflows.”

UiPath is available on Google Cloud Marketplace.

UiPath also recently expanded its partnership with Google Cloud to help users facilitate their automation journeys through Google Workspace business collaboration offerings.