| > To Continue with Chapter 2
The Digital Representation of Sound The world is continuous. Time marches on and on and there are plenty of things that we could measure at any instant. For example, weather forecasters might keep an ongoing recording of the temperature, or the barometric pressure. If you are in the hospital, then the nurses might be keeping a record of your temperature, or your heart rate (EKG), or your brain waves (EEG), or your insturance coverage. Any one of these records gives you a function f(t), where at a given time t, f(t) would be the value of the particular statistic that interests you. These sorts of functions are called time series.
Now, it's true that to illustrate the idea of a graph, we could have used a lot of simpler things (like the Dow Jones average, or a rainfall chart, or an actual EKG. But, you've all seen stuff like that, and also, we're really nerdy (well, one of us isn't), so we thought these would be really, like, way cool (totally). Copyright Juha Haataja/Center for Scientific Computing, Finland Of course, the time series that interest us are those that represent sound. In particular, what we want to do is take these time series, stick them on the computer and start to play with them! So, how do we do it? That's the problem that we'll start to investigate in this chapter. How can we represent sound as a finite collection of numbers that can be stored efficiently, in a finite amount of space, on your computer, and played back, and manipulated at will! In short, how do we represent sound digitally?!?!" Here's a simpler restatement of the basic problem: computers basically store a finite list of numbers (which can then be thought of as a long list of 0s and 1s). These numbers also have a finite precision. A continuous function would be a list infinitely long! What is a poor electroacoustic musician to do? (Well, one thing to do would be to remember our mentions of sampling in the previous chapter). Somehow we have to come up with a finite list of numbers which does a good job of representing our continuous function. We do it with samples of the original function, at every few instants (of some predetermined rate, called the sampling rate) recording the value of the function. For example, maybe we only record the temperature every 5 minutes. For sounds we need to go a lot faster, and often use a special device which grabs instantaneous amplitudes at rapid, audio rates (called an Analog to Digital converter, or ADC). A continuous function is also called an analog function, and to restate the problem, we have to convert analog functions to lists of samples, or digital functions, the fundamental way that computers store information. In computers, think of this function not as a function of time (which it is) but as a function of position in computer memory. That is, we store these functions as lists of numbers in computer memory, and as we read through them we are basically creating a discrete function of time of individual amplitude values.
Here's a pictorial description of the recording and playback of sounds through an ADC/DAC. Analog to digital (A->D) and digital to analog conversion (D->A). In A->D, continuous functions (air pressures, sound waves, voltages) are sampled, and stored as numerical values. In D->A, these numerical values (or ones that we might just make up) are interpolated by the converter to force some continuous system (such as amplifiers, speakers, and subsequently, the air and our ears) into a continuous vibration. Interpolation just means smoothly going between the discrete numerical values. When a sound, image, or even a temperature reading is recorded digitally, we numerically represent that phenomenon by storing information about it. A digital sound recording is just a numerical representation of a sound. To convert sounds between our analog world and the digital world of the computer, we use a device called an Analog to Digital Converter (ADC). A Digital to Analog Converter (DAC) is used to convert these numbers back to sound (or to make the numbers usable by an analog device, like a loudspeaker). An ADC takes smooth functions (of the kind found in the physical world) and returns a list of discrete values. A DAC takes a list of discrete values (like the kind found in the computer world) and returns a smooth, continuous function, or more accurately, the ability to create such a function from the computer memory or storage media.
Here are two different graphical representations of sound. The top is our usual time domain graph or audiogram of the waveform created by a 5-note whistled melody. Time is on the x-axis and amplitude is on the y-axis. The bottom picture is the same melody, but this time we are looking at a time-frequency representation. The idea here is that if we think of the whistle as made up of contiguous small chunks of sound, then over each small time period the sound is composed of differing amounts of various pieces of frequency. The amount of frequency y at time t is encoded by the brightness of the pixel at the coordinate (t,y). The darker the pixel, the more of that frequency at that time. For example, if you look at time 0.4 you see a band of white, except near 2500, showing that around that time, we mainly hear a pure tone of about 2500 Hz, while at 0.8 seconds, there are contributions all around from about 0 to 3000 Hz, but stronger ones at about 2500 Hz and 200 Hz. We'll be giving a much more precise description of the frequency domain in Chapter 3, but for now it's enough for us to simply think of sounds as combinations of basic sounds which are distinguished according to their "screechiness". We then assign numbers to these basic sounds according to their screechiness. The screechier a sound, the higher the number. As we learned in Chapter 1, this number is called the frequency, and the basic sound is called a sinusoid. So, high frequency means high screechiness, and low frequency means low screechiness (like a deep bass rumble), and in-between is, well, simply in between. Any sound is a combination of these sinusoids, of varying amounts. Sort of like making a soup and putting in a bunch of basic spices. The flavor will depend a lot on how much of each spice you include, and you can really change things dramatically when you alter the proportions. The sinusoids are our basic sound spices! The complete description of how much of each of the frequencies are used is called the "spectrum" of the sound. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||