Microsoft released its first public beta of its Microsoft Speech Server and a beta version of its Speech Application Software Development Kit (SDK).
The software platform is designed to host voice-based services similarly to the way web servers host a company's website, as well as supporting "multimodal" applications that take advantage of both voice and web interfaces. It is based on Salt (Speech Application Language Tags), an extension of exisiting scripting languages including HTML and XML.
Companies needing call centres can cut costs by automating them on the server, said Xuedong Huang, general manager for Microsoft speech technologies.
Among other things, the server can interpret callers' requests and provide recorded or synthesised responses. Developers also can integrate the voice-based services with web-based applications that can continue to run on a web server as they do now. For example, a caller could ask for a stock quote verbally and have it displayed on a handheld device.
The beta version of the server can deliver voice-only services to a wired phone and multimodal services to any device with a screen that uses either a wired or a IEEE 802.11 wireless Lan connection to the server.
Other wireless technologies will be supported later, Huang said.
The software includes a speech recognition engine for handling users' speech inputs and a prompt engine to bring up prerecorded prompts from a database to play for users. It also has a text-to-speech engine that can synthesise audible prompts from a text string when a prerecorded prompt is not available. In addition, it has a Salt Interpreter and other components to support services to callers.
The SDK, a set of tools and controls based on Salt, lets developers build telephony and multimodal applications.
Microsoft released it in its third beta version. The SDK is designed to make it easy for developers to incorporate speech functionality into web applications and to build speech applications using Visual Studio .net 2003, according to Microsoft.
New features in the third beta include Pocket Internet Explorer Bits for Pocket PC access to Microsoft Speech Server applications, a simulation of the Speech Server and preset controls for managing responses containing digits and letters, such as credit card numbers.
Voice is one user interface that could be used with any type of device, Huang said. Not everyone has a PC but most people have phones, and speech may be the best way to interact with small devices.
The Salt Forum has submitted Salt 1.0 as a specification to the World Wide Web Consortium (W3C). The group has more than 70 members, including founding members Microsoft, Cisco Systems, Intel, Philips Electronics, SpeechWorks International and Comverse.
Salt is a more lightweight extension of existing markup languages than is Voice XML, a specification being used by many voice-based services developers today, according to Mark Plakias, an analyst at Zelos Group.
It allows companies to draw upon a larger pool of developers than does Voice XML, which is more familiar to developers of traditional IVR (integrated voice response) systems.
Both Plakias and Microsoft's Huang look to the two specifications eventually merging under the W3C.
Microsoft is expected to ship the first production versions of the server and SDK in the first quarter of 2004.
Stephen Lawson writes for IDG News Service