Brian Clegg discovers the bugbears that are standing in the way of its widespread acceptance
Dial any company these days and you are likely to be informed that "for your convenience" you are going to be asked to work through half a dozen tedious touch-tone menus. This is Interactive Voice Response (IVR), the call centre manager's friend.
IVR sits uncomfortably in the gap between telephony and computing. Originally, telephony manufacturers claimed it for their own, resulting in proprietary systems that could not interface with the rest of the IT infrastructure, but increasingly telephony suppliers are using off-the-shelf computing hardware.
The intention was mainly to save money, but standard hardware has the additional benefit that software becomes cheaper and easier to develop. Even so, poor programming is often evident in IVR systems.
Errors are sufficiently common that many customers are put off (see box). Good IVR menu structures are not easy to design. Experts recommend having no more than three levels, each with three choices, which in turn have three options - 27 options in all. In a recent study of over 1,000 North American IVR systems, the average number of options was 70.
Another problem is that IVR has conflicting aims. Customers want access to the appropriate information or person as quickly as possible. However, from the manager's viewpoint, IVR is a way to extract from customers information that they did not plan to give. This can be used cynically to build in delays, the theory being that customers will not notice a minute's wait so much if they are busy listening to prompts and pressing buttons.
Negative customer connotations are so strong that, despite the fact that IVRcan provide totally automated transactions, some companies will not use it. At Standard Life Bank, for instance, managing director Jim Spowart is insistent that the first point of contact should be a human being.
Another issue is that IVR depends on touch-tones, and is therefore not universally usable.
A modern phone button produces two frequencies in a quaint analogue matrix - one of the tones varies as you move side to side on the telephone keypad, the other as you move up and down. Customers with traditional pulse phones (about 20% in the English-speaking world and 80% plus in other areas) cannot use IVR and 30% or 40% would rather hang up than battle with it.
There are also some applications that are inappropriate. Imagine the tedium of a US system that asked for a state and then made the caller wait to hear "for Wisconsin, press 50". For many applications, speech seems to be a better bet to control the system.
Although basic speech IVR - where customer response is limited to a number or "yes" and "no" answers - does bring the technology to pulse phones, it can be even more tedious for the caller than conventional IVR.
Realistically, the solution of choice is natural language IVR. Here the system copes with a wide range of words, often in a continuous flow. If this can be achieved, there are benefits both for the customer (getting a quicker, more natural interaction) and for the company (reducing call times).
Being able to use natural language IVR depends on reliably interpreting speech with a wide range of accents, untrained, across a telephone line - a significant technical challenge. However, the system can be helped by the reduced vocabulary required at any particular stage.
If all you need is a spoken digit (for instance, when entering a credit card number), accuracy can be pushed to over 99%. This may sound impressive, but a credit card's 16 digits are each liable to error. In combination, accuracy drops to 85%. This is not a bad figure - about the same as the error rate for a customer entering a number on a keypad - but is well short of perfection.
Getting a natural language system in place and debugged is a complex undertaking. Even with an existing IVR menu structure, there will be changes.
There are many ways of saying the same thing. Where a keypad system dealt with one for "yes" or two for "no", a natural language system may face 100 ways of answering in the affirmative or the negative. Rather than programming for every eventuality, a natural language system typically has tokens meaning "yes" and "no". Whatever is said is translated first into the appropriate token before entering the menu system.
However good the interpretation, the designers first have to match the customer requirement correctly. The most effective method so far devised to test and refine a natural language IVR is the Unisys Wizard of Oz technique. Without something of this kind, the result is inevitably a disappointment.
An associated technology beginning to influence IVR is voice XML. IVR systems often duplicate information and processes available on a company's Web site. Voice XML is a variant of XML, the extension of the HTML Web programming language designed to handle data as well as graphical and text content.
Voice XML (still more in development than commercial use - see www.voicexml. org) uses voice commands to navigate Web information, then reads back the results, enabling a single, appropriately designed site to support both Web browsers and voice.
IVR has potential benefits for customers and for businesses. However, until now, poor implementation has produced significant customer resistance. Natural language and voice XML may be the technologies IVR needs to fight back.
Common IVR menu failures
The Unisys Wizard of Oz approach to IVR tuning
Unisys has an impressive technique for tuning natural language IVR. Traditionally, systems are tested by company volunteers but often they know too much about the requirements, or their voices do not correspond to the customer cross-section. In the Wizard of Oz approach, the system is tested directly on the customer base but via an intermediary. Like the wizard in the film, a human operator is hidden away in the machine, pulling the levers without the user's knowledge.
Customers ring up and appear to be dealing with a voice response system. Each time they speak to the system, the IVR program makes a stab at interpreting their speech, but the actual response is clicked on manually by a human operator. This means that the software learns where it is making mistakes and structures can be redesigned where they do not match customers' requirements.
The customer always gets the right result rather than being exposed to teething troubles.