Mobile speech recognition technology currently in use generally requires a speaker's voice be transmitted through a wireless network to a stationary computer for processing. The computer deciphers the speech and passes it back through the wireless network to the other end.
When a mobile phone user has poor network coverage - say in a building or on the edge of a service provider's coverage area - some voice fidelity is lost. Static or intermittent breaks might be heard in a typical phone call. A computer trying to perform voice recognition processing on a garbled transmission would be more likely to fail at the task, said Steve Chambers, SpeechWorks vice-president of worldwide marketing.
The problem is compounded by the limitations of wireless phone network signal transmissions, he said. A voice call cannot use the same kinds of redundant transmissions meant to guarantee Internet packets end up where they're supposed to go.
"Recognition engines aren't going to be able to discern echoes," Chambers said. "If you were calling an airline, and said 'Boston', the phone couldn't send the signal three times. The server would hear 'Boston Boston Boston' and think it's a new word every time."
Motorola and SpeechWorks' prototype is meant to attack the signal problem at the source. The processor cleans up the voice signal as much as it can, performing the digital equivalent of the signal. The clarified signal can be more easily processed by a server running SpeechWorks software.
"If you have a device with our technology talking to a server with our technology, the device sends a much cleaner signal to the server," Chambers said.
The experimental phone uses a field-force automation application allowing a sales representative to check any account's status by speaking into a handheld device from the road. It also has a travel reservation application for travellers to check a flight's status. Chambers wouldn't say what kind of phone is being used.
The technology should reach the market within the next 12 to 18 months, Chambers said.
As digital signal processors improve, more of the computation required for voice recognition can be done inside mobile phones, eliminating the signal losses through the air. Digital signal processors are to voice recognition what graphics cards are to video games - the faster the processor and the more memory supplied to it, the better the output.
Without dramatic improvement of network transmission speed, beyond what 3G wireless data networks promise, there will always be a client-server relationship for voice recognition applications on mobile devices. Too much information must be stored on a phone for it to be practical, Chambers said.
"A typical database might contain 100,000 words, maybe 50 or 60Mbytes," Chambers said. "I know of some twice as large. People aren't going to wait 30 seconds or a minute to download a vocabulary database when they can just call up the agency and ask the question."