You are here  Software

Speech recognition enters the mainstream

Dennis Gaughan
Wednesday 18 July 2001 11:49
Speech recognition reduces the burden on your call centre; but it is not yet ready for your sales force, according to Dennis Gaughan of AMR Research

Speech recognition technology has been around for 30 years, but improved technology and adoption by application vendors has recently brought speech from the lab to the masses. Finding a home with the largest CRM vendors - Siebel, PeopleSoft, and Nortel Networks - it is already paying dividends for call centres, and the technology shows promise for the sales force.

Hype and potentiality
The major Customer Relationship Management (CRM) vendors have created a great deal of buzz about the new speech-enabled interfaces for their products' Sales Force Automation (SFA) modules. However, while the technology shows a lot of potential, the products are just being released. Since even the best new products require time to work out the kinks, AMR Research suggests waiting at least six months before deploying these new interfaces. Still, there have been many successful implementations of speech technology in terms of augmenting call centres. We recommend working with your CRM vendors and their partners for a call centre suite first.

The wireless community has driven much of the recent interest in speech technology in recent years. But early implementations of wireless, many built around the Wireless Application Protocol (WAP), have failed to live up to expectations. Slow data rates and small screen sizes contribute to the shortcomings, but the main problem has to do with the wireless industry promising more than it can deliver in terms of features. In order to augment
"It's the combination of speech and wireless data that will provide the usability and flexibility mobile users are looking for"
Source: AMR Research
their products, wireless carriers are touting speech recognition as the next big thing. This, however, is not an either/or proposition: it's the combination of speech technology and wireless data (often called multimodal access) that will provide the usability and flexibility mobile users are looking for.

Cutting call costs
The emergence of the Internet as a commerce tool has made it easier than ever for customers to move from one supplier to another. As a result, enterprises have
"Most vendors have also moved beyond specific languages and can tailor their engines to support regional dialects"
Source: AMR Research
had to improve customer service to gain a competitive edge, if not to survive. A best-in-class call centre is a great way to deliver quality customer service, but call centre agents are expensive. Call costs range from $1 to $10, depending on the call duration and the skill level of the agents. Many service organisations have augmented their live agents with automated menus, with touch-tone response to limit the number of calls handled by agents. This has been successful for many implementations, but it does have limitations:

Non-intuitive menus - The menus for touch-tone applications can be several layers deep and difficult to navigate

Slow performance
- Having to wait through the options of a touch-tone menu can be excruciating

Limited input - The type of data that can be input is limited by the numeric keypad on telephones

As the technology has matured, speech recognition has become the latest tool for call centre managers. Because it supports natural language and gives greater control to end users, speech technology is being implemented as the next phase of call centre development. The customer satisfaction rates for a speech interface are typically very high, and the interface can help reduce the number of calls that live agents handle. In addition, the cost of a speech-recognition-based call can be 8 to 12 times less expensive than a live agent call.

The evolution of speech technology
Speech recognition technology has been around for 30 years, but only in the last 5 to 7 years has it seen significant adoption. Several factors contribute to this increased usage. For one, Automated Speech Recognition (ASR) vendors have been able to consistently improve the accuracy of the recognition through tuning. The engines also provide much more functionality:

Speaker independence - Early versions of speech technology required that the interface be laboriously trained in order to recognise the nuances of a particular voice. Each user had to go through this tuning process, often more than once. Current versions of the speech engines do not require training, which significantly increases their usefulness and ease of use.

Multiple languages - The current technology for speech engines also supports multiple languages. Most vendors have also moved beyond specific languages and can tailor their engines to support regional dialects.

According to Moore's Law of computing, the processing power of computers doubles every 18 months, and this has also benefited speech technology. The increase in processing power has directly contributed to the quality of the recognition and the reduced cost of a speech implementation.

New vertical markets for speech
The market for speech recognition has been concentrated into two specific verticals: financial services and transportation (airlines). Automated banking applications and flight information lines are the most common implementations of speech technology. Starting in these verticals is a plus for speech vendors: the large consumer deployments of these applications help prove the scalability of the speech technology.

Another emerging market for speech technology is telecommunications. Carriers are looking for ways to provide value-added services to their subscribers for two very good reasons: additional revenue opportunities and reduced churn. The voice portals have been well received by the carriers, which can optimise this technology to provide an Internet experience for their entire customer base.

The CRM vendors have enjoyed market penetration across several industry verticals. The integrated speech components of CRM will be an important up-selling opportunity for the major vendors in the CRM market, and the ones that integrate speech will help the speech technology vendors establish a strong presence in markets beyond their traditional realm.

The fragmented speech market
The vendor landscape in the speech market is extremely confusing. While there are many vendors that sell in this market, the few vendors that sell the speech engines provide the foundation for the overall segment. Speechworks and Nuance dominate, and their technology is embedded in several of the other vendor's products. In addition, Conversay, Voice Signal, Philips, and IBM all provide engines.

While the speech engine is the most important part of a speech application, it is only a piece of the puzzle. Another important piece is Text-to-Speech (TTS) engines, which synthesise the data from applications into a human voice. Most of the speech engine providers also sell TTS.

In addition to these technologies, there are voice platforms that utilise the ASR and TTS engines and provide additional tools and integration capability. Vendors, such as VoiceGenie, Informio, Nortel's Periphonics group, and JustTalk, are building value-added functionality to extend the core technology for specific verticals or applications.

To add to the confusion, a whole new set of vendors, dubbed Voice Portals, have used these technologies to provide Internet access to subscribers using the telephone. Companies, like Wildfire Communications, Tellme Networks, and Hey Anita, have done fairly well in attracting subscribers, but their value proposition is limited to phone access. Many mobile users, on the other hand, will eventually adopt the blended voice and data access provided by mobile platform vendors. However, the number of vendors and products in the speech market can make choosing a partner very difficult. Industry expertise, cost, and the type of application you want to develop will all factor into your decision. The following are some general guidelines for negotiating the landscape:

Speech Engines - Ultimately, the speech engine vendors provide the most flexibility if you are looking to build a system from the ground up. The vendors are also branching out beyond the core speech technology and are constructing some pre-built dialogues.

Speech Platforms - The platform vendors' value proposition is taking the engines and building pre-defined speech applications that can be delivered with little customisation. Look to the platform vendors and their pre-built dialogs; they may have done much of the work in advance.

Voice Portals - The voice portals will provide the quickest path to speech-enable your Web content, but they allow for the least amount of control or flexibility.

VXML has been widely supported by the vendors in the Speech market, but it must still evolve as a standard to be utilised by Internet programmers

As with any emerging market, standards are being developed to help with interoperability. Voice Extensible Markup Language (VXML) is the standard the market has adopted, and most speech vendors' products already support it.

The beauty of VXML is that regular Internet developers will be able to maximise speech without requiring specific skills in speech programming. However, as with other flavours of XML, VXML must continue to evolve before standard speech applications appear on the Internet.

While the support for VXML has been strong within the speech community, the technology must gain a much larger adoption for it to live up to its potential. Unfortunately, this is going to be almost impossible. The software vendors in other categories are struggling with their own issues and versions of XML, so it will be difficult to get any attention for a standard that is not an immediate priority.

A speech interface for sales
Even though speech technology is new for the CRM vendors, there is a great deal of excitement about using a speech interface for sales. In a recent survey conducted by AMR Research, which focused on Return on Investment (ROI) for CRM implementations, end-users said they have been able to generate real ROI by implementing call centre applications or customer-facing applications. The ROI and usage on sales, though, is typically harder to achieve.

Why is this the case? Because salespeople are notoriously difficult users, and it can be very hard to get them excited about a technology that will change the way they work. The one common element for all salespeople is a strong reliance on the telephone. By providing a familiar user interface (the phone) that can be used with minimal training, CRM vendors are hoping that speech technology will be the catalyst that drives up customer success when implementing SFAs.

For speech to be a successful addition to an SFA product, it must provide the core pieces of information that salespeople need on the road (contacts, tasks, opportunities, and calendar) and deliver it with a user experience that is intuitive and requires almost no training. If the process is too complex or time-consuming, salespeople will not use it. Thus far, the first releases of the new speech-enabled SFA applications live up to the requirements of usability and utility. However, the CRM vendors need to deliver referenceable customers to validate their investment in speech technology if they are going to have widespread success.

Recommendations
1. Look to the speech vendors directly if your opportunities for speech applications extend beyond SFA. The speech interfaces of these SFA applications are very tightly linked to the suite and are not designed to support multiple applications.

2. Look to the emerging mobile platforms for speech technology if your requirements also include wireless data. Be aware that the data comes first for these vendors.

3. Don't underestimate the importance of proper call flow design at the start of any speech project. A properly designed call flow will serve its intended purpose of reducing volume to the agents; a poorly designed call flow is a CRM nightmare.
An error occurred on this page.
An error occurred on this page.