—> To Continue with Chapter 5

Localization/Spatialization

Filter-based localization applet. This applet lets you pan, fade, and simulate binaural listening (best if listened to with headphones).
Soundfile .x

Using the program Soundhack, by Tom Erbe, we can place a sound anywhere in the binaural (two ears) space. Soundhack does this by convolving the sound with known filter functions which simulate the ITD (Interaural Time Delay) between the two ears (and the head!). We'll talk about this below.

Close your eyes and listen to the sounds around you. How well can you tell where they’re coming from? Pretty well, hopefully! How do we do that? And how could we use a computer to simulate moving sound, so that, for example, we can make a car go screaming across a movie screen or a bass player seem to walk over our heads?

Humans have a pretty complicated system for perceptually locating sounds, involving, among other factors: the relative loudness of the sound in each ear; the time difference between the sound’s arrival in each ear; and the difference in frequency content of the sound as heard by each ear. How would a "cyclaural" (the equivalent of a "cyclops") hear? Most attempts at spatializing, or localizing, recorded sounds make use of some combination of factors involving the two ears, on either side of the head.

Simulating Sound Placement

Simulating a loudness difference is pretty simple — if someone standing to your right says your name, their voice is going to sound louder in your right ear than your left. The simplest way to simulate a volume difference is to increase the volume of the signal in one channel, while lowering it in the other — you’ve probably used the pan or balance knob on a car stereo or boombox, which does exactly this. Panning is a fast, cheap, and fairly effective means of localizing a signal, although it can often sound artificial.

The Interaural Time Delay (ITD)

Simulating a time difference is a little trickier, but adds a lot to the realism of the localization. Why would a sound reach your ears at different times? After all, aren't our ears pretty close together? We’re generally not even aware that it does — snap your finger on one side of your head, and you’ll think that you hear the sound in both ears at exactly the same time.

But you don’t. Sound moves at a specific speed, and it's not all that fast (well, compared to light, or the way one of us drives, we're not saying which) about 345 meters/second. Since your fingers are closer to one ear than the other, the sound waves will arrive at your ears at different times, if only by a small fraction of a second. Since most of us have ears that are quite close together, the time difference is very slight — too small for us to consciously "perceive." But let's say, like one of our authors, your head is roughly 250 cm. wide, or a quarter of a meter. It takes sound around 1/345 of a second to go one meter, which is approximately .003 seconds (3 thousandths of a second). It takes about a quarter of that time to get from one ear of our fatheaded co-author to the other, which is about .0007 (.7 thousandths of a second). Yikes, that's a pretty small amount of time. Do you believe that our brains perceive that tiny interval, and use the difference to help us localize the sound? You better, because there's a frisbee coming at you right now and it would be nice to know which direction it's coming from (whoops, too late). In fact, you do, and it's even smaller because your head's even smaller than .25 meters (we just rounded it off because we've always had trouble with math). The technical name for this effect is Interaural Time Delay (ITD).

To simulate
ITD by computer, we simply need to add a delay to one channel of the sound. The longer the delay, the more the sound will seem to be panned to one side or the other (depending on which channel is delayed.) The delays must be kept very short (as we saw above) so that, as in nature, we don’t consciously perceive them as delays, just as location cues. Our brains take over and use them to calculate the position of the sound. Wow! Who designed that?


Modeling Our Ears and Our Head

That the ears perceive and respond to a difference in volume and arrival time of a sound seems pretty straightforward, if amazing. But what’s this about a difference in the frequency content of the sound? How could the position of a bird change the spectral make-up of its song? The answer: your head (it's all in your head)!

Imagine someone speaking to you from another room. What does their voice sound like? It’s probably a bit muffled, or hard to understand. That’s because the wall through which the sound is traveling —besides simply cutting down the loudness of the sound —also acts like a
low-pass filter. It lets the low frequencies in the voice pass through, while attenuating or muffling the higher ones.

Your head does the same thing. When a sound comes from your right, it must first pass through, or go around, your head in order to reach your left ear. In the process, your head absorbs, or blocks some of the high frequency energy in the sound. This is clearly the origin of the term "blockhead"! Since the sound didn’t have to pass through your head to get to your right ear, there is a difference in the spectral makeup of the sound that each ear hears. As with ITD, this is a subtle effect, although if you’re in a quiet room and you turn your head from side to side while listening to a steady sound, you may start to perceive it.

Modeling this by computer is easy, provided you know something about how the head
filters sounds (what frequencies are attenuated, and by how much). If you’re interested in the "frequency response of the human head," there are a number of published sources available for the data, since it is used by, among other people, the government, for all sorts of things (like flight simulators, for instance). Researcher and author Durand Begault has been a leading pioneer in the design and implementation of what are called head transfer functions — frequency response curves for different locations.

What are HRTF's? (Head Related Tranfer Functions)
Figure .x This illustration shows how the spectral contents of a sound change depending on which direction the sound is coming from. The body (head and shoulders) and the time time-of-arrival difference that occurs between the left and right ear creates a filtering effect.

There is no permission for this photo.

Binaural Dummy Head recordingsystem.

This system includes an acoustic baffle with the approximate size, shape and weight of a human head. Small microphones are mounted where our ears are located.

This recording system is designed to emulate the acoustic effects of the human head, just as our ears might hear sounds, then capture the information on recording media.

A number of recording equipment manufacturers make these "heads," and they often have funny names (Sven, etc.). The head in the above picture looks alarmingly like one of our authors.

Thanks to Sonic Studios for this photo.


Not so surprisingly, humans are extremely adept at locating sounds in two dimensions, or the plane. We're great at figuring out the source direction of a sound, but not the height. When that lion is coming at us, it's nice of evolution to have provided us the ability to know, quickly (and without much thought), which way to run. It's perhaps more of a surprise that we're less adept at locating sounds in "3-D," or more accurately, in the "up/down" axis. We don't really need this. Unless we're Vince Carter, we can't jump high enough for that perception to do us much good, and we don't have predators from above (like barn owls, who have little filters on their cheeks that make them extraordinarily good at sensing their sonic altitude distances — if you had to catch and eat, from the air, rapidly running field mice you would be too!). So if it's not a frisbee heading at you more or less in the 2-D "plane", but a softball headed straight down towards your head, we'd suggest a helmet.

—> To Continue with Chapter 5

<— Back to 5.2

<— To Table of Contents