
Speech recognition reduces the burden on your call centre; but it
is not yet ready for your sales force, according to Dennis Gaughan
of AMR Research
Speech recognition technology has been around for 30 years, but
improved technology and adoption by application vendors has
recently brought speech from the lab to the masses. Finding a home
with the largest CRM vendors - Siebel, PeopleSoft, and Nortel
Networks - it is already paying dividends for call centres, and the
technology shows promise for the sales force.
Hype and potentiality
The major Customer Relationship
Management (CRM) vendors have created a great deal of buzz about
the new speech-enabled interfaces for their products' Sales Force
Automation (SFA) modules. However, while the technology shows a lot
of potential, the products are just being released. Since even the
best new products require time to work out the kinks, AMR Research
suggests waiting at least six months before deploying these new
interfaces. Still, there have been many successful implementations
of speech technology in terms of augmenting call centres. We
recommend working with your CRM vendors and their partners for a
call centre suite first.
The wireless community has driven much of the recent interest in
speech technology in recent years. But early implementations of
wireless, many built around the Wireless Application Protocol
(WAP), have failed to live up to expectations. Slow data rates and
small screen sizes contribute to the shortcomings, but the main
problem has to do with the wireless industry promising more than it
can deliver in terms of features. In order to augment
 |  | "It's the combination of speech
and wireless data that will provide the usability and flexibility
mobile users are looking for" |  | | | | |
|  | Source: AMR Research |  |  |
|
 |
their products, wireless carriers are touting speech recognition as
the next big thing. This, however, is not an either/or proposition:
it's the combination of speech technology and wireless data (often
called multimodal access) that will provide the usability and
flexibility mobile users are looking for.
Cutting call costs
The emergence of the Internet as a
commerce tool has made it easier than ever for customers to move
from one supplier to another. As a result, enterprises have
 |  | "Most vendors have also moved
beyond specific languages and can tailor their engines to support
regional dialects" |  | | | | |
|  | Source: AMR Research |  |  |
|
 |
had to improve customer service to gain a competitive edge, if not
to survive. A best-in-class call centre is a great way to deliver
quality customer service, but call centre agents are expensive.
Call costs range from $1 to $10, depending on the call duration and
the skill level of the agents. Many service organisations have
augmented their live agents with automated menus, with touch-tone
response to limit the number of calls handled by agents. This has
been successful for many implementations, but it does have
limitations:
Non-intuitive menus - The menus for touch-tone applications
can be several layers deep and difficult to navigate
Slow performance - Having to wait through the options of a
touch-tone menu can be excruciating
Limited input - The type of data that can be input is
limited by the numeric keypad on telephones
As the technology has matured, speech recognition has become the
latest tool for call centre managers. Because it supports natural
language and gives greater control to end users, speech technology
is being implemented as the next phase of call centre development.
The customer satisfaction rates for a speech interface are
typically very high, and the interface can help reduce the number
of calls that live agents handle. In addition, the cost of a
speech-recognition-based call can be 8 to 12 times less expensive
than a live agent call.
The evolution of speech technology
Speech recognition technology has been around for 30 years, but
only in the last 5 to 7 years has it seen significant adoption.
Several factors contribute to this increased usage. For one,
Automated Speech Recognition (ASR) vendors have been able to
consistently improve the accuracy of the recognition through
tuning. The engines also provide much more functionality:
Speaker independence - Early versions of speech technology
required that the interface be laboriously trained in order to
recognise the nuances of a particular voice. Each user had to go
through this tuning process, often more than once. Current versions
of the speech engines do not require training, which significantly
increases their usefulness and ease of use.
Multiple languages - The current technology for speech
engines also supports multiple languages. Most vendors have also
moved beyond specific languages and can tailor their engines to
support regional dialects.
According to Moore's Law of computing, the processing power of
computers doubles every 18 months, and this has also benefited
speech technology. The increase in processing power has directly
contributed to the quality of the recognition and the reduced cost
of a speech implementation.
New vertical markets for speech
The market for speech recognition has been concentrated into two
specific verticals: financial services and transportation
(airlines). Automated banking applications and flight information
lines are the most common implementations of speech technology.
Starting in these verticals is a plus for speech vendors: the large
consumer deployments of these applications help prove the
scalability of the speech technology.
Another emerging market for speech technology is
telecommunications. Carriers are looking for ways to provide
value-added services to their subscribers for two very good
reasons: additional revenue opportunities and reduced churn. The
voice portals have been well received by the carriers, which can
optimise this technology to provide an Internet experience for
their entire customer base.
The CRM vendors have enjoyed market penetration across several
industry verticals. The integrated speech components of CRM will be
an important up-selling opportunity for the major vendors in the
CRM market, and the ones that integrate speech will help the speech
technology vendors establish a strong presence in markets beyond
their traditional realm.
The fragmented speech market
The vendor landscape in
the speech market is extremely confusing. While there are many
vendors that sell in this market, the few vendors that sell the
speech engines provide the foundation for the overall segment.
Speechworks and Nuance dominate, and their technology is embedded
in several of the other vendor's products. In addition, Conversay,
Voice Signal, Philips, and IBM all provide engines.
While the speech engine is the most important part of a speech
application, it is only a piece of the puzzle. Another important
piece is Text-to-Speech (TTS) engines, which synthesise the data
from applications into a human voice. Most of the speech engine
providers also sell TTS.
In addition to these technologies, there are voice platforms that
utilise the ASR and TTS engines and provide additional tools and
integration capability. Vendors, such as VoiceGenie, Informio,
Nortel's Periphonics group, and JustTalk, are building value-added
functionality to extend the core technology for specific verticals
or applications.
To add to the confusion, a whole new set of vendors, dubbed Voice
Portals, have used these technologies to provide Internet access to
subscribers using the telephone. Companies, like Wildfire
Communications, Tellme Networks, and Hey Anita, have done fairly
well in attracting subscribers, but their value proposition is
limited to phone access. Many mobile users, on the other hand, will
eventually adopt the blended voice and data access provided by
mobile platform vendors. However, the number of vendors and
products in the speech market can make choosing a partner very
difficult. Industry expertise, cost, and the type of application
you want to develop will all factor into your decision. The
following are some general guidelines for negotiating the
landscape:
Speech Engines - Ultimately, the speech engine vendors
provide the most flexibility if you are looking to build a system
from the ground up. The vendors are also branching out beyond the
core speech technology and are constructing some pre-built
dialogues.
Speech Platforms - The platform vendors' value proposition
is taking the engines and building pre-defined speech applications
that can be delivered with little customisation. Look to the
platform vendors and their pre-built dialogs; they may have done
much of the work in advance.
Voice Portals - The voice portals will provide the quickest
path to speech-enable your Web content, but they allow for the
least amount of control or flexibility.
VXML has been widely supported by the vendors in the Speech market,
but it must still evolve as a standard to be utilised by Internet
programmers
As with any emerging market, standards are being developed to help
with interoperability. Voice Extensible Markup Language (VXML) is
the standard the market has adopted, and most speech vendors'
products already support it.
The beauty of VXML is that regular Internet developers will be able
to maximise speech without requiring specific skills in speech
programming. However, as with other flavours of XML, VXML must
continue to evolve before standard speech applications appear on
the Internet.
While the support for VXML has been strong within the speech
community, the technology must gain a much larger adoption for it
to live up to its potential. Unfortunately, this is going to be
almost impossible. The software vendors in other categories are
struggling with their own issues and versions of XML, so it will be
difficult to get any attention for a standard that is not an
immediate priority.
A speech interface for sales
Even though speech technology is new for the CRM vendors, there is
a great deal of excitement about using a speech interface for
sales. In a recent survey conducted by AMR Research, which focused
on Return on Investment (ROI) for CRM implementations, end-users
said they have been able to generate real ROI by implementing call
centre applications or customer-facing applications. The ROI and
usage on sales, though, is typically harder to achieve.
Why is this the case? Because salespeople are notoriously difficult
users, and it can be very hard to get them excited about a
technology that will change the way they work. The one common
element for all salespeople is a strong reliance on the telephone.
By providing a familiar user interface (the phone) that can be used
with minimal training, CRM vendors are hoping that speech
technology will be the catalyst that drives up customer success
when implementing SFAs.
For speech to be a successful addition to an SFA product, it must
provide the core pieces of information that salespeople need on the
road (contacts, tasks, opportunities, and calendar) and deliver it
with a user experience that is intuitive and requires almost no
training. If the process is too complex or time-consuming,
salespeople will not use it. Thus far, the first releases of the
new speech-enabled SFA applications live up to the requirements of
usability and utility. However, the CRM vendors need to deliver
referenceable customers to validate their investment in speech
technology if they are going to have widespread success.
Recommendations1. Look to the speech vendors directly if your opportunities
for speech applications extend beyond SFA. The speech interfaces of
these SFA applications are very tightly linked to the suite and are
not designed to support multiple applications.
2. Look to the emerging mobile platforms for speech
technology if your requirements also include wireless data. Be
aware that the data comes first for these vendors.
3. Don't underestimate the importance of proper call flow
design at the start of any speech project. A properly designed call
flow will serve its intended purpose of reducing volume to the
agents; a poorly designed call flow is a CRM nightmare.