When it comes to technology, size does matter: we like our devices small and portable. By 2007, according to Gartner Group, more than 60% of 15 to 50 years olds in the EU and US will be carrying a wireless computing and communications device for at least six hours a day.
But as mobile devices get smaller, the usability problems associated with them are growing. As users squint at ever tinier screens and fumble with miniature keypads, manufacturers are looking at an alternative: using automatic speech interfaces that let you carry out transactions and get information just using your voice.
Many IT industry players believe that speech interfaces will succeed where WAP failed to deliver truly mobile applications - and in doing so could remove part of the rationale for high-speed mobile services.
"While the world waits for 3G services, we can still build functional applications using speech technology," says Hitesh Seth, chief technology evangelist at software development firm Silverline Technologies.
Look - no hands
Speech interfaces have two main benefits. They offer potential for genuinely hands-free, eyes-free mobile applications, allowing devices to be used safely while driving, walking - or even rollerblading.
But even if you're sitting down using a landline, there are compelling commercial reasons why speech technology makes sense. With businesses under pressure to improve service while cutting costs, automatic speech recognition could enable call centres to deal with many more customers without increasing their human headcount.
"Companies, to remain competitive, need to interact with customers and supply easy retrieval of information 24 hours a day, seven days a week," argues Mark Blowers, senior analyst with Butler Group.
"The proliferation of mobile phones has created a new impetus for using a speech interface to interact with contact centres, or access information held on the Internet."
Butler Group says that implementing voice technology can reduce contact centre costs by as much as 25%, and that using speech technology to answer calls can be up to 90% cheaper than having them answered by a human agent.
This kind of mobile speech application is a major growth area; according to IDC, the worldwide market for telephony speech processing software increased 65.5% in 2000, from $264.4m (£185.4m) to $436.4m (£306.1m), and will grow to $3.5bn (£2.5bn) by 2005.
Access the Net with your voice
Two years ago, Microsoft announced that speech interfaces would be a key part of Microsoft .net, its vision for Internet services that "give consumers and businesses the Web the way they want it-any time, any place and on any device". Last August, Oracle announced that it was building voice support into its applications and into Oracle9i Application Server Wireless,
There's a growing number of automatic speech recognition applications in general use. They include the speech-activated Odeon cinema booking line; the Orange Wildfire voicemail and personal organiser service; interfaces to phone banking systems offered by, for example, Lloyds and Abbey National; and Charles Schwab's VoiceBroker system which allows users to get stock quotes over the phone automatically by speaking the stock name.
Speech interfaces are improving productivity in internal telephony systems. London law firm Edwards Duthie Solicitors has increased the productivity of its telephone operators by 40% using a voice activated directory system that enables both internal and external callers to get automatically connected to the individual or department they want by speaking the appropriate name.
Voice portals to Web services, such as Tellme, Audiopoint, Quack.com, and Bevocal, are another growth area. According to research firm Kelsey Group, 45 million US mobile users will regularly use voice portals to access information by 2005.
What it takes to be a good listener
A speech interface usually involves two separate technologies: speech synthesis and speech recognition. Of the two, speech recognition has been the real technical challenge.
Even after four decades of research in industry and academia, and vast leaps in processor and memory power, we're still a long way from systems that can reliably understand any sentence, uttered by any speaker.
Most public speech recognition systems can still only deal with single words or short phrases, and a limited range of input; the Odeon booking line, for example, only has to be able to distinguish between different town names.
But despite these compromises, the combination of better speech algorithms and more powerful processors mean that the technology is now good enough to be usable. "More and more organisations are gaining confidence that the technology actually works," says Nick Applegarth, managing director for EMEA with speech technology company Nuance. "The issue now is, how can I deploy this in such a way that my customers will keep wanting to come back?"
VoiceXML is one of the developments that could help companies deploy effective speech applications. As the name suggests, it's a mark-up language that's part of the emerging XML family, but specifically designed for voice applications, and supporting concepts such as playing a recorded voice tag, waiting for voice input, defining a vocabulary and so on.
The big advantage of VoiceXML will be in lowering the cost of entry into voice-enabled applications through a standard mark-up approach. "It brings the Web and the voice world together from a development perspective," explains Silverline's Seth. "You don't need to understand Interactive Voice Response technology; you can use same skill set for voice and Web applications."
Even voice specialists like Applegarth admit that speech interfaces haven't always been popular with users. "A lot of people you speak to can point to a bad experience with them," he says.
But like them or loathe them, we may have to get used to them: in a world where services are increasingly delivered over a telephone line, the economics of speech interfaces mean that they're probably here to stay.
This was first published in February 2002