The number of powerful PCs now being sold at palatable prices and the advances in digital audio technology mean that users are increasingly turning to voice recognition systems for speech-to-text dictation and PC voice control purposes. But has the hype of viable speech recognition software met with expectation from the solution provider community? Has the voice web finally come of age and what can resellers and the like offer in the way of new services?
A debatable point
Don Edwards, managing consultant for AG Solutions is convinced that viable speech recognition software does exist, is in use and has been for some time. Companies such as Delta Airlines (flight booking) and Sears in the USA (automatic centralised telephone call handling for the whole US retail group) bear allegiance to the technology along with many others which illustrate tangible business benefits. Edwards says the claim that it hasn't met with expectations is debatable. "That all depends on what the expectations were in the first place," he says. "The voice web is now emerging as a viable channel and an alternative to hand-wired or digital telephony. It can offer a combination of simultaneous channels, but still relies on some of the basic technologies, such as speech recognition and text-to-speech." Edwards claims the technology has progressed beyond the original 'Dalek in a dustbin' level. "It now offers intelligent intonation, phrasing and pronunciation, and while it can still be recognised as electronic in origin when listened too, a recent demonstration to the BBC elicited the comment that it was 'perfectly acceptable for pieces of 100-150 words'."
Duncan Ross is business manager for IBM's speech business unit, which claims to be currently positioned at the forefront of this field and to have sold over one million units of its speech recognition product ViaVoice. "Our channel partners are healthily growing their businesses with ViaVoice," claims Ross, who points to a London-based reseller, The Speech Recognition Company which has built its business in the last five years entirely on its service portfolio surrounding speech products.
Ross claims a main benefit for channel partners is the "massive scope for providing specific tailoring of the products for the clients". He says: "The client may want vocabulary enhanced on the technology to cater for an industry/profession that uses very specific terminology, for example, the medical or legal professions. The solution provider may be in a position to offer a specialist add-on application that exploits the speech recognition but which is not available in the core product we provide." He highlights the second biggest area in the voice market, which concerns the voice-enabled Web products. In this field, IBM has developed the Websphere Voice Server, which allows developers to create voice-enabled applications that utilise a Voice over IP network infrastructure.
The sector expressing most interest here, says Ross, is that of the telecommunications industry with BT as the most notable recent customer. Amid the flurry of telecommunications companies rushing to gain 3G licences which will enable them to move data more speedily, they are now deciding how to offer the services they have committed to providing their customers.
With regards to other periphery products that can be sold into the voice deal, Ross says that along with the obvious headsets, digital recording devices are a good bet, and that there is an increasing trend towards devices such as PDAs, Psions and Palm devices.
BT Syncordia Solutions is the e-business and communications unit for British Telecom. It acts as an outsourcing partner providing networking and application management. Part of this work involves platform hosting and development for companies such as lastminute.com.
Last year, BT Syncordia partnered with interactive voice response (IVR) software manufacturer Nuance to provide voice facilities for the lastminute Web site. Chief technology officer for BT and VP of BT Syncordia Phil Flavin, agrees the new breed of software is finally coming of age but like all technologies requires a little give and take. "It certainly suits some individuals more than others," he says, claiming that Syncordia itself has a success rate of over 85% with its own company call system (that is over 85 % of telephone calls using the system are successfully placed).
Flavin claims the very natural conversations that are the result of IVR software are becoming more commonplace now, rather than just an occasional success, with a whole raft of enterprise applications such as e-mail and diary now available to the end-user. "Consumer expectations are getting ever higher now," he says, "and voice recognition provides a much richer environment to meet this requirement. From a solution provider's point of view, there are abundant opportunities for providing a much richer customer relationship experience. Apart from the obvious 'human-like' interaction for example, there is the efficiency of having much speedier data-gathering at the start of service calls, rather than customers having to wait ages to speak with someone and then giving all their details."
Flavin remains enthusiastic about the greater efficiencies engineered by IVR and the subsequent opportunities that are emerging from it. Additionally, he highlights the emergence of voxXML as the new language of IVR for the Internet. "We are currently using embedded voice features on platforms which provides a further richness of channel for users. For example, you could say 'fishfingers' into your mobile phone while you are out to add the item to your shopping list on your home PC. This type of facility adds another content (voice) onto what is becoming a multi-channel service environment."
So what are the hurdles involved for solution providers interested in this market? Flavin is sceptical of the "many poor IVR systems that are available in the marketplace currently" and warns providers to be careful in selecting who they partner with. "There are definitely high barriers to entry in providing a top-class IVR product," he says. "Obviously the whole point of the game is to provide a very natural experience for the end-user which will be as good as speaking with a human; if this is not achieved then what is the point?"
Building robust dialogues for users and therefore making it a natural experience is generally agreed by all in the industry to be the priority, but whether this can be done as a core skill will depend on the provider. According to Alex Monaghan, text-to-speech team leader for speech technology software manufacturer Aculab, the skills required by resellers in setting up a voice portal are considerably reduced if they go with a single provider. "This will provide guaranteed integration and the one-stop support and generic programming interface that are all necessary with this type of implementation," says Monaghan, whose company claims to offer these things.
Think3 is a US-based computer-aided design software manufacturer which uses voice recognition as part of the functioning of its latest computer-aided-design (CAD) product. Vice-president of marketing Kara Kerker is adamant that speech recognition has now come into its own, following the launch of its product of the same name. "Think3 certainly has a reliability level, somewhere around 90%, which previous systems just haven't had," says Kerker. She claims the design/manufacturing industry which are Think3's end-users are particularly suited to the technology due to the technical complexity of the business and the fact that users traditionally need to sort through many layers of commands and menus to implement the commands required.
With speech recognition, Kerker says, the user experiences a much more natural and speedy form of interaction with his/her system. "The more you speak, the better the system gets as it learns your speech patterns." These basic benefits of voice recognition software are then also part of the bonus for the solution provider/VAR, Kerker continues: "Basically, the same benefits that the end-user experiences will also be experienced by the VAR who will see a much quicker adoption of the software, this being critical to the success of many implementations."
The proof of the pudding
Think3 itself, currently has one UK reseller partner called Think3 Dimensions. Managing director Kevin Billington admits there was initial scepticism around the technology previous to them taking on the Think3 product six months ago. "From our point of view we had heard various horror stories such as companies implementing new speech interfaces bolted onto old software interfaces. These may not have actually been 'architected' properly to work with speech and were very clumsy to use, therefore defeating the object of the whole game which is to produce a much more natural input mechanism." Billington also points out that there were and still are many speech interfaces which do not simply work that well, or are just not accurate. "All these worries have added to the VARs initial scepticism surrounding voice recognition software," he says. "However, we have found the proof is in the pudding now and if it speeds up the sales process for us and ultimately the end user experience, then we are happy."
Naturals for communication
There are obvious sectors which suit the technology. Commenting further on which industries are best suited to voice recognition technologies, AG Solutions' Edwards says: "There are those that need to talk to people and let people talk to them about information-based topics which follow fairly standard patterns, and where the personal touch is less important than the efficient and flexible communication of information." He points out that the technology would probably work in, but not really be suitable for, areas such as sales and funeral directors. "Sectors such as banking, transport and utilities are naturals, though."