Until recently, speech recognition software was seen as a clunky niche product, with technical and cultural problems. That is now changing fast, driven mainly by the huge take-up of mobile phones. Mobile phone manufacturers want an easy, low-overhead interface for their products, particularly for the latest generation of Wap phones. Providing commands by voice is an obvious way forward and is fuelling demand for voice-driven systems, not just for mobile phones, but for a whole range of areas where keyboard or mouse commands are not practical.
Developments are being driven by the leading suppliers of speech recognition software (see box) and by major software suppliers. Better support for voice-driven applications is one of the features Microsoft has highlighted in Windows 2000, which includes some voice-activated commands as standard as well as enhanced plug and play functionality for peripherals, to provide greater support for mobile computing and future intelligent devices.
The race is on to tie speech recognition systems more closely to well-established office applications, as well as to combat some of the issues that have so far held such systems back. At the beginning of March, IBM launched a toolkit for its ViaVoice speech recognition system to enable developers to build speech recognition into mobile and handheld devices. ViaVoice already provides extensive voice command capabilities for Windows 2000 applications, including the ability to write and send e-mail messages using Microsoft Outlook, as do most of the other main speech recognition packages.
Other areas are also being addressed. Independent speech recognition specialist, the Speech Recognition Company (SRC), has developed a system that enables staff to input data by voice directly into company databases, saving on typing mistakes and speeding up input. "The business benefit is being able to get the data in quicker," says Colin Howland, founding director of SRC. A good touch typist can input about 80 words a minute, says Howland. Properly set up, he says, speech systems can input data at almost twice that, although an average speed would be closer to 100 words a minute.
To date, one of the biggest problems with speech systems has been the amount of time they take to set up. In order to recognise words and phrases correctly, speech systems need to be tailored to a specific voice in a specific environment. A user cannot simply take the software out of the box, install it and start working. It takes time to set up and all the major suppliers have worked on reducing this time by improving the general vocabulary, the ability of systems to cope with different accents and the processing of complex speech recognition algorithms, which is why most systems still need a fast PC with lots of memory. "The reduction in training time has been the single biggest advance by the manufacturers," comments Howland.
Another important advance has been in the area of microphone technology and noise reduction. One of the requirements of a speech system is a good quality audio system and they have been hampered by poor quality PC microphones, or sound cards right next to the hard disc drive that have picked up static. Now, universal serial bus microphones are being sold with built-in sound cards, to get round these problems and provide a clear audio signal for the speech system.
Similarly, noise cancellation microphones and headsets have been developed, mainly for use in large call centres, but they are also proving invaluable in increasing the quality of speech recognition systems.
But there are still problems. Speaking to a machine takes getting used to. People used to dictating to a secretary, don't find this a problem, but they don't like the idea of losing their human assistant - still a status symbol in many organisations. Secretaries may not appreciate being replaced by software, although many firms will transfer staff to jobs in other areas.
A further problem, according to Howland, is that some people can't think fast enough. "Some people's minds only work as fast as they can two-finger type," he claims. "They can't think at 150 words a minute."
Despite hurdles, it seems that after years of slow progress, voice-driven applications may finally be coming in from the cold. Fairly soon, we will all be talking to our computers.
Leading speech recognition packages
|VoiceXpress||Lernout & Hauspie||www.lhsl.com|
Case study: IBM's ViaVoice
London-based international property consultant, Knight Frank, has just completed a pilot study of a speech recognition system based on IBM's ViaVoice. The company's head of IT, Mark Clemence, says he will be recommending that the software is made available to those who want to use it, but there are some provisos.
"We are not saying that we will put speech recognition on every desktop," comments Clemence. "There are some situations where it is not useful." Anyone able to type fast would not benefit from the system, he says, and nor would professionals with substantial existing secretarial support. The people most likely to benefit are those who use word processing and e-mail applications, but who fall into the "poke and hope" school of typing. The system is also likely to be useful for professionals out on the road, or working out-of-hours, since it will enable them to input data directly and provide documents for clients without having to wait for them to be typed up by secretaries.
Clemence also points out that there are financial overheads. The system requires a fast PC with plenty of memory, and although the software is relatively cheap, the cost of training pushes the price up to more than £1,000 per user.