White Paper: Environmental Audio Technology

While the ultimate goal of environmental audio (making the illusion absolutely perfect) has not yet been attained, EAX provides a...

This Article Covers

Data centre

While the ultimate goal of environmental audio (making the illusion absolutely perfect) has not yet been attained, EAX provides a remarkable improvement over existing 3D audio solutions


3D audio today allows the game designer to place and move sources of sound at relative angles with respect to the player. This is commonly known as 3D positional audio. What is missing, however, is any sense of space and surroundings. Environmental audio extends the 3D programming interface by allowing the designer to specify, in detail, an environment whereby sound producing objects are manipulated. The sound is then modeled within the environment, producing audible reflections and reverberations. The result is a truly "live" sound experience.

The 3D environmental audio model thus produced can be experienced with simple stereo speakers or headphones. It can be rendered even more accurately using multiple speakers. This widens the "sweet spot" bringing additional realism, while still allowing designers to work within the same API.

How do we sense our audio environment?

Imagine a listener in an interesting place, sitting in a concert hall enjoying music or hiding in a dungeon corner sniping at mutants. If they were to close their eyes, they would be able to picture the shape of the room surrounding them and, in most cases, reasonably accurately place the source of any sounds being produced. How is this accomplished?

First, having two ears helps a lot because the brain can correlate what each ear hears with the other. The brain measures the differential time of arrival of any wavefront (this is called the inter-aural time delay or ITD), and from this can determine the angle of a sound from the mid-body plane, thus reducing the possible locations of a sound to a conical set of points.

Interaural time delay

The angular location is further defined within this cone of points by a variety of means. One primary mechanism is the audio filtering effect produced by the many indirect paths of the sound to each ear. This creates an angle dependent filter called a head related transfer function or HRTF.

Head Related Transfer Function

By use of ITD and HRTF, the angle of incidence of a sound can be fairly accurately estimated by the brain. But neither of these mechanisms provides a clue to distance. While it is true that the loudness of a sound relates to distance, it is certainly common within our experience to hear distant loud sounds or nearby quiet sounds. The primary distance cue the brain uses is the relative amount of direct sound to reverberant sound. This direct to reverberant ratio (DRR) is the primary distance cue.

But the location of a handful of sounds is only a small part of our audio perception. The listener is aware of the kind of room they are in and even what walls are located nearby. A good listener would be able to even tell if a stealthy person tiptoed by him purely based on the sonic environment. How does our listener's brain determine all this? Once again by the details of reflected sound: reverberation.

Virtual reality

How can this audio experience be captured for the game player? One approach would be to put a microphone next to each sound source and then put a speaker at the same relative location and play back the recorded sound. However, because the reverberation has not been captured, this won't work very well. This approach is generally used in "stereo" recordings, and while it is satisfactory for music, when the listener sits still, it fails completely in computer games, where the listener moves around in a virtual environment.

Another approach is placing tiny microphones inside the listener's ears and recording what reaches his eardrums. By playing this same sound back through headphones, we would expect the headphone listener would hear the complete experience. This is called "Binaural Recording," which works well, particularly if the listener's own ears are used for the recording process. But while this method captures the sonic environment, it fails again in the computer gaming scenario because of the interactivity of the user's location. This model for recording is static and non-interactive.

3D audio - the current-state-of-the-art

AS current 3D audio solutions use a handful of simple techniques we can now attempt to locate virtual audio sources. When headphones are used, ITD and HRTF's are applied to the indicated object's natural sound and recorded in the absence of a sonic environment (an "anechoic chamber" is the ideal). The listener's ears then receive the sound appropriate to the object when it is located at the proper angle. Under many circumstances this works rather well. However, the majority of gamers don't like wearing headphones. And when stereo speakers are used with this same method, a new problem occurs - how to avoid the sound intended for the left ear from reaching the right ear and vice versa? The solution is a clever technique called "crosstalk cancellation which works by computing the appropriate sums and differences of frequencies so that when the sound from both speakers reaches the ear, the differential delay between the ears (ITD) cancels out the unwanted audio channel. And as many demonstrations have shown, one can get quite impressive localization of virtual audio objects in this way.

Environmental Audio Extensions

Microsoft has incorporated functionality into DirectX that allows game designers to incorporate 3D positional audio into their games. But the current DirectSound 3D API does not provide enough information to move beyond the existing 3D audio technology. Specifically, there is no method of specifying any sonic environment. Thus no reverberation can be applied. Distance is simulated only by loudness and the game sound designer has a choice of recording the sounds of objects including reverberation (which won't localise well), having a "dry" sounding game or foregoing the use of 3D audio altogether. Neither choice is appealing.

Fortunately, DirectSound 3D allows enhancement of the API by the use of Property Sets, and Creative, working with Microsoft and other industry leaders, has finalised a property set which permits the addition of reverberation to the DirectSound 3D API. This property set is called the Environmental Audio Extensions (EAX). The property set is non-proprietary; indeed Creative is working with Microsoft to incorporate the Environmental Audio Extensions into a future revision of DirectSound 3D.

For the standard DirectSound methods, when DirectSound is invoked, the sound card driver is queried by the operating system and acceleration of the standard methods is arranged if supported; if not, software emulation is performed. Then, if the application is prepared to take advantage of the additional capabilities implemented in a property set, it queries DirectSound as to whether the property set is supported and this query is passed onto the sound card driver. If the property set is supported, the query succeeds and the additional methods in support of the capabilities become available to the application. Note that the features in the property set need not be accelerated in hardware. While the Sound Blaster Live! Card will support EAX with hardware acceleration, the property set can be made available via software emulation as well.

The Environmental Audio Extension property set controls a number of properties which control the reverberation and audio reflections. Primary reverberations properties allow the game designer to specify the relative loudness or volume of reverberation in the environment; how long the reverberant decay of the space is and the general damping properties of the walls of the space. "Custom" custom properties control several more nitty-gritty details of the reverb, including the room size, the low frequency decay (which relates to the humidity) and the diffusion of the chamber (which relates to the coarse texture and geometry of the walls).

All this is beyond the level of detail the typical game designer is interested in dealing with when under pressure to ship the title. While all the parameters are necessary, they are specified in terms an audio engineer not a dungeon master, would understand. So to help out, included in the EAX SDK is a set of presets which specify named environments. These presets can either be used as supplied or as points of departure for experimenting with new environments. When the property is set to one of these presets, the other properties acquire default values and an "instant" environment in created.

Typical reverb presets

By adding reverberation as a simple "preset" which is specified using the property sets associated with EAX, the game programmer now has a method to get live, reverberant sound as well as 3D audio. For many years we've heard claims from lots of advertisers that they have ways of enhancing the music and sound we hear. One often wonders why the producers of professional audio recordings - all the movies, TV and music we hear, don't make use of these supposedly marvelous "stereo enhancement" technologies when the recordings are produced. The answer is that recording studios have their own method of "enhancement," and that is the professional reverb.

Limitations of current 3D audio

Lots of people ask: "What is Environmental Audio?" The honest answer is that it's an open-ended project. The aim is to produce sound so real that you can't tell just by listening that you're not in the virtual environment being presented. The practical answer is that it's reverb, reflections, occlusions and a lot more.

Why do many listeners fail to hear what the technology attempts to present? Binaural recordings work well if they are made using ears similar to the listener's own. But they fail in one important aspect: the recordings cannot anticipate any motion in the listener's head. If the headphone wearer rotates his head, his brain expects all the sound sources to change their ITD according to the angle moved. This doesn't happen and the brain immediately notices and the illusion is shattered. The situation is even worse when 3D audio is presented using two speakers and crosstalk cancellation.

When the listener rotates his head, three things happen. First, the ITD does not change as his brain expects, so the illusion of angular position collapses. Second, the ITD instead DOES give a solution for the angle of the speaker, so he now perceives the sound as coming directly from the speaker. Finally, his ear is no longer correctly located for the crosstalk cancellation to work, so it receives sound intended for both ears. In summary, everything is all screwed up.

The degree to which this occurs varies with individual listeners. Some people move their head more than others; some brains place more confidence in ITD while others trust HRTF's more. So one listener may find two speaker 3D audio doesn't work well at all for him, while another may find it quite compelling. Recent studies indicate that the population splits about 50/50 head-turner ITD people and HRTF people.

However, it's not good business to have a product that fails for half the population. What can be done to make 3D audio more robust?

More speakers

The fundamental problem confounding head-turners is that the ITD changes for sounds in front of the listener in exactly the opposite manner as for those behind the listener. When a sound source is behind and to the right, a clockwise head rotation moves it towards the midline; if it is at the equivalent angle in front, it moves away from the midline.

The only way to solve this is to put something physically behind the listener. And that is exactly what EAX proposes to do. The API remains the same, but EAX technology, based on environmental audio research, allows the proper sound to be routed to additional speakers such as those found in the 5.1 configuration for Dolby Digital or the 7.1 configuration for MPEG-2. Even one speaker behind the head turning listener vastly improves the situation. The more speakers, the more effective the illusion when the head is turned.

But multiple speakers have an additional advantage. Consider the way crosstalk cancellation works. Essentially, the left ear and right ear signals are combined to form an interference pattern. This pattern solves a three dimensional differential equation at two points in space, providing the proper waveform at the listener's eardrums. Fortunately, the equation also forms an approximate solution in the vicinity of these two points and if the listener's head is in this region (called the "sweet spot"), crosstalk cancellation works.

By providing multiple speakers, additional independent variables are added to the differential equations for crosstalk cancellation. This allows an exact solution at more than two points in space and, more importantly, it extends the validity of the approximate solution. Consequently, as intuition might suggest, the presence of multiple speakers substantially extends the "sweet spot" in 3D audio. So not only head turners, but HRTF people benefit.

Evolution of EAX

Creative's EAX supports many speaker configurations with the same API. When multiple speakers are used, the 3D nature of the sound as well as the effects of the reverb and reflections to emulate the surrounding sonic environment is dramatically enhanced. While the ultimate goal of environmental audio - making the illusion absolutely perfect, has not yet been attained, EAX is a remarkable improvement over existing 3D audio solutions.

While EAX includes as part of the technology support via DirectSound Property Sets, it is important for programmers to realise that Creative is "API Neutral." This means that the goal is to use audio technology to provide a superior audible experience for everyone.

EAX on sound blaster live!

The Environmental Audio Extensions will be accelerated in hardware by the EMU10K1 audio processor in Sound Blaster Live!. The acceleration is performed at the full 48 KHz sample rate and can be performed on as many as 16 discreet channels using an arbitrary number of voices. The 32 bit audio processing occurring within the EMU10K1 ensures the ultimate in fidelity throughout the processing chain.

( Creative Technology Ltd. 1998

Compiled by Arlene Martin

This was first published in October 1999



Enjoy the benefits of CW+ membership, learn more and join.

Read more



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: