folienfeuer - stock.adobe.com

Feature

Smart speakers: How to give apps a voice

Experts discuss how to go about creating compelling voice-enabled user interfaces for smart speakers

Cliff Saran, Managing Editor

Published: 07 Feb 2018

Research from Gartner shows that about a fifth of people in mature markets are voicing a question to their smart speaker at least once a week.

However, the same study reveals that 10% admit they have stopped using the devices. “The drop-off rate is quite high,” says Annette Jump, research vice-president at Gartner. “This is very similar to wearables. Consumers try a device, and either it does not do enough for them or it does not recognise their voice. Or they can only link to a Google email account or the user does not have a smart home.”

While being able to speak to a device offers a lot of convenience, there are personal boundaries in what people will ask, she notes. This is one of the challenges businesses need to take into account when assessing how to go about developing a voice assistant app.

“Using voice control on a smartphone is location-specific, while at home people will ask the device to play music, control smart home devices and ask about the weather,” says Jump.

Amazon Echo smart speakers use in-built voice recognition software Alexa to connect to apps called “skills” that add third-party functionality.

But Jump asks: “While Alexa has 10,000 skills, how many are actually useful?”

According to analyst Rethink, voice functionality is like Apple’s AppStore, whereby it acts as a route to market for services, and those services will pay a percentage of revenue brought to market via voice.

Rethink says Amazon Alexa wants as many services (skills) as possible to build momentum.

“Later, we believe it will be in a position to charge services for a service delivery skill where it is proactive – for instance, offering a cheap taxi ride from Uber to take you to work because it senses it is raining, and asking for some of the Uber payment,” says the analyst.

The voice user interface is a growing area of interest, according to Accenture. “We are seeing an explosion in interest in voice assistants due to the launch of Amazon Alexa and Google Home. The technology is now ready for mass adoption,” says Emma Kendrew, artificial intelligence (AI) lead for Accenture Technology.

She says voice assistants provide a much more natural experience for the user. Accenture has found that the use of voice interaction for talking to a machine is more natural among Digital Natives, and it is they who tend to be driving demand for new apps and smart speaker devices.

This is encouraging businesses to experiment with voice, says Kendew. “Customer-facing industries are leading on this. Financial services are also experimenting with specific use cases, such as using a voice assistant as a mortgage advisor,” she says.

“Organisations are interested in how to transform customer experience and are looking at where voice assistants should and shouldn’t be used.”

Getting started with voice

Kendew says it is very important to design voice assistants with the user in mind. As these new technologies become available, organisations need to think about what is the right way to use them to achieve their objectives.

But where should they start? Bill Kingston heads up Elixirr Creative, an agency which has built a number of voice-based apps. “With Alexa,” he says, “you can use an existing framework to turn an RSS into a skill, or take the custom skill route. The question is, is it worth developing a custom skill?”

In Kingston’s experience, it is usually better to start with an existing feed and see how well it works, then decide whether to invest in developing a custom app.

A good voice interaction requires specialist skills – “you need conversational designers, just as you have user interface designers”

Claus Jepen, Unit 4

However, businesses can find that retrofitting an existing app with voice assistance may not work particularly well. Such an Alexa skill can be a bit one-way, according to Kingston, which might not fit the natural feel everyone is striving for.

Enterprise software company Unit 4 has developed an intelligent assistant called Wanda, which uses natural language queries. The company has developed a proof-of-concept voice assistant using Microsoft Cortana.

Claus Jepen, chief architect at Unit 4, has overseen the project. “Old rules-based interactive voice response applications are not beneficial to anyone,” he says. “We paid lot of attention to ensure Wanda understands complex sentences.”

He says one of the greatest technical challenges is the ability to infer meaning in a conversation – a human trait that is very hard to program.

As an example of a piece of dialogue that the voice assistant should be able to handle in a human way, Jepen says: “If you ask, ‘Please give me revenue results’, a human would automatically assume that infers the latest results. But often a voice assistant will only work programmatically and will need additional contextual information. For instance, Alexa would probably ask the user to clarify what they wanted, asking, ‘What results would you like?’.”

Since it is pretty easy to develop a simple Alexa skill based on an existing app, Jepen says a lot of people are getting involved, but they are not sure what to do with it. “Unless you put a lot of effort into designing the dialogue and handling context without having to get the user to restart the conversation, you end up with nothing,” he warns.

A good voice interaction requires specialist skills, says Jepen. “You need conversational designers, just as you have user interface designers. But we don’t know how well the voice assistant works until we get real users.”

While testing its Wanda assistant, Unit 4 collects conversation snippets that fail to provide the human tester with an appropriate conversational response. “When the Wanda assistant can’t figure out what the user wants, we pull out the dialogue and upload it back into the training,” says Jepen.

In effect, Unit 4 teaches Wanda to respond correctly. It is machine learning.

Going forward with voice

Neither Google nor Amazon have not stopped at voice assistants. Amazon now offers a screen on its Alexa Show device.

“What is starting is combination of voice interaction with screen interaction – blending two channels into one seamless flow,” says Accenture’s Kendew.

Combining a visual user interface with voice opens up new opportunities for customer interaction, such as providing an intelligent kiosk at an airport or shopping centre.

From the conversations Computer Weekly has had, one thing is clear: there are two main voice platforms and they are very different.

“The user journey is very different on the Amazon Alexa platform in comparison with the Google Assistant platform, with Alexa requiring users to ‘subscribe’ to skills that interest them, like our own daily briefing skill,” says Rob Fricker, product manager at Time Out. “Once subscribed, you ask Alexa, ‘What’s my daily briefing?’, and it replies with Time Out’s top three things to do in the city today.”

Gartner’s Jump believes Google Assistant has the more complete voice interface, when compared with Alexa. Being linked to a Google account, Google Assistant gives users access to their calendar, so it can tell them more and prompt them to do things. But as with Alexa, she says there are lots of times it says, ‘I don’t know’.

Elixirr Creative’s Kingston agrees. “Google Assistant has so much data, it is probably a step ahead,” he says.

There is not going to be one overall winner, and as Unit 4 has shown, in business there may be an opportunity to make use of Cortana on Windows 10.

The experts Computer Weekly has spoken to recommend that businesses look at voice user interfaces. These are not quick wins or just another channel to market. A truly compelling voice interface needs time and effort to develop and test, and teams will need dialogue experts, just as they now have user interface experts.

Time Out speaks

Time Out recently launched a conversational app on the Google Assistant, designed to provide a personal touch to keep the conversation going. Rob Fricker, product manager at Time Out, explains how the app was developed.

“One of the most interesting things about the process was the conversation design – it’s something we were all completely new to,” he says.

“We built the app on Google’s Dialogflow, which uses natural language processing to understand voice input. We spent a lot of time thinking about the different paths conversations might take and how to address questions the assistant might not know.

“This included watching people in conversation with each other when one was acting as a computer – it’s funny how polite people are when they think they’re talking to a computer. We also measured how people interacted through various iterations which was crucial to determine the kind of questions people might ask the app.

“User testing throughout the development cycle also helped set expectations of the functionality the app should be able to perform, and we were able to prioritise certain features based on this feedback and learnings that users would benefit from more.”

Smart speakers: How to give apps a voice

Experts discuss how to go about creating compelling voice-enabled user interfaces for smart speakers

Getting started with voice

Read more about smart speakers in business

Going forward with voice

Time Out speaks

Read more on IT innovation, research and development

What is an AI assistant?

What is a voice user interface (VUI)?

Amazon Echo

Enterprise and home find use for intelligent virtual assistants