Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I


Chapter 1

Non-mathematical, introductory analogy of wavelet theory to fugue analysis and storywriting for the computer musician



I. Overview and comparison to classical Fourier techniques

Wavelet analysis is one of many generalized time-frequency analysis methods which give a signal's frequency content at a certain point in time. Perhaps the most well-known of these methods is the Fourier family of transforms, which include the continuous Fourier transform, which deals with continuous signals; the discrete Fourier transform, which deals with discrete (sampled) signals; and the short-time Fourier transform, which deals with signals in short, often overlapping, windowed segments.

For the computer musician, the Fourier transform is a flexible analysis/resynthesis tool which has been extensively used in such applications as phase-vocoding and spectral estimation. In a typical analysis/resynthesis, a musical signal is broken down into small "chunks" or windows. Each window is then analyzed to determine the frequency content in the window. This frequency information is altered in some way, and then the altered frequency information is resynthesized back into a new musical signal. Although Fourier methods are very powerful in practice, they also have drawbacks and limitations: a well-known fact of physics and signal processing is that there is a tradeoff between the control of time and frequency resolution. This fact is also known as the Heisenberg uncertainty principle: the finer the time resolution of the analysis, the more coarse the frequency resolution of the analysis, and vice-versa.

While the mathematics of the Fourier transform is beyond the scope of this thesis, it is important to note that the entire Fourier family of transforms, in practice, is at one extreme of this time-frequency tradeoff. Usually, the Fourier transform emphasizes a fine frequency resolution as opposed to a fine time resolution. For example, a composer is often interested in distinguishing pitch, so distinguishing between 500 Hz and 580 Hz using the Fourier transform is important. Subsequently, the Heisenberg uncertainty principle states that a certain amount of information, or equivalently, a sufficient amount of time called a window is required for this analysis (Strang 67-68). However, if the same composer becomes interested in distinguishing between 500 Hz and 540 Hz, the time period (window size) required by this second Fourier analysis, according to the Heisenberg uncertainty principle, will be larger, or, equivalently, be longer in time than the first window's size, since the frequency resolution (or smallest possible frequency difference) is smaller in the second case. As a result of the increased frequency resolution, however, the composer is more uncertain as to where in time the distinctions between 500 Hz and 540 Hz occur than he is about where the distinctions between 500 Hz and 580 Hz occur, since the window used to analyze the 500 Hz / 540 Hz case is longer in time than the window used to analyze the 500 Hz/580 Hz case is shorter in time. The main drawback of this compromise between time and frequency is that sounds which do not take advantage of a fine frequency localization still suffer from poor time localization, since the window has uniform frequency and time resolution throughout its duration.

Conceptually, this tradeoff can be seen as a certain, uniform, division of the time-frequency plane:

Figure 1.1

The time-frequency tradeoff associated with Fourier analysis has several implications for composers and scientists working in electro-acoustic music. For example, one cannot theoretically construct a frequency equalizer whose parameters move infinitely fast, since the time required for such a change would be infinite. Furthermore, non-pitched sounds, such as percussive sounds or transient clicks, are hard to localize in time and subsequently alter in meaningful ways, since they tend to "smear" across the time-frequency plane (high frequencies tend to die away slowly along the time axis):


Figure 1.2

Wavelet analysis offers an alternative to Fourier analysis that remedies these problems. In essence, wavelet analysis divides the time-frequency plane in a different, non-uniform manner:


Figure 1.3

In this scheme, frequency resolution is finer than time resolution at low frequencies, while time resolution is finer than frequency resolution at higher frequencies, as seen in Figure 1.3. The factors by which the width and height of the boxes change at each horizontal and vertical "level" are 1/2 and 2, respectively. The Heisenberg uncertainty principle is still preserved since the area of the rectangles remains constant, but the shape of the boxes is different - an "octave-band" decomposition is performed in standard wavelet analysis. The bandwidths of the rectangles at a certain horizontal level are exactly one half the bandwidths of the rectangles one level higher, and thus a logarithmic frequency analysis is being done with respect to the analysis level number. This style of analysis has been linked with "constant-Q filters," which are constructed so that their pass-bands are logarithmically proportional to the bandwidth of the input. The wavelet analysis scheme matches the models of many physical phenomena well, for perceptual scales, such as pitch, are often logarithmically related to frequency. Furthermore, the wavelet transform may be a more efficient representation for physical signals because 1) it can isolate transient information in a fewer number of rectangles (wavelet coefficients), than Fourier analysis. 2) The average magnitude associated with each rectangle (wavelet coefficient) is small relative to the average magnitude of the input.

Wavelet analysis has another advantage over Fourier analysis. For both the Fourier transform and the wavelet transform, not only does each shaded rectangle give information about the frequency content in a certain time area, but there is also a different time-domain waveform associated with each one of the rectangles. These waveforms, called "basis functions" in the mathematical literature, are building blocks for the time-domain representation of the entire input signal. The darkness of each rectangle shows how much of each building block is used to build the signal at any given time. For example, a vertical line drawn through any of the time-frequency diagrams above would pass through several rectangles. A sum of the time-domain waveforms associated with each of these rectangles, each waveform being weighted according to the darkness of its shading, would give the original waveform's shape at the time associated with the drawn vertical line.

For the windowed Fourier transform, the basis functions associated with the rectangles are all segments of sinusoids. The higher the rectangle is vertically in each graph, the higher the frequency associated with the waveform. These basis functions are generally the same for each rectangle on the same vertical (frequency) level. Wavelet analysis also has basis functions associated with each rectangle which are generally the same for each rectangle on the same vertical (frequency) level - this is a property known as translation invariance. However, wavelet theory's basis functions are not necessarily sinusoid in shape; the important defining property that they share with Fourier basis functions is that they are all scales and translates of one generalized shape; that is, all of the time-domain waveforms associated with the non-uniform grid of rectangles above are all horizontally and/or vertically shrunken and/or compressed versions of a basic shape, called the "mother wavelet." In a standard wavelet analysis, the basis functions associated with each vertical level are two-times compressed versions of the basis functions associated with the vertical level directly below them:

Figure 1.4

Unlike Fourier analysis, a mother wavelet can have several different shapes. For certain purposes, a particular wavelet shape will provide a "cleaner," or more compact representation in the wavelet domain than other shapes. While this is certainly a strength of wavelet analysis which some researchers have tapped for feature recognition and analysis of self-similar signals, the construction of a wavelet that satisfies all of the properties discussed so far is no trivial task. In fact, much of current wavelet theory and research is involved in discovering how to construct sets of wavelet bases in a straightforward manner which satisfy all of the severe mathematical constraints associated with them.

The strength of wavelet analysis is its ability to convey varying bandwidths of frequency information in inversely varying time periods. As seen in the above diagram, the standard wavelet basis functions associate higher bandwidths with higher center frequencies and lower time periods, a property which is congruent with the logarithmic perceptual representation of many physical signals. Furthermore, the "scaling" property associated with the basis functions provides a unity for the wavelet functions that is useful in pattern recognition and analysis of self-similar signals.

II. Analogy of wavelet theory to eighteenth century fugue writing

Wavelet analysis is a blend of many different scientific disciplines: functional analysis, linear algebra, and signal processing are most associated with wavelet theory. However, for the average computer-musician, the mathematical concepts involved with wavelet analysis are too abstract for day-to-day usage and implementation. Fortunately, a good analogy of wavelet theory can be made to standard eighteenth century fugue analysis, a subject with which the computer-musician is probably more familiar. Let us see how some ideas in fugue writing translate into parallel concepts in wavelet analysis.

Imitative counterpoint, of which fugal counterpoint is a subcategory, focuses on manipulations of a melodic subject, or a relatively short, recognizable horizontal musical line. As is typical in tonal polyphony, it is common for certain alterations of the subject to appear simultaneously in other voices with the original subject itself, so long as the harmonic implications of the combination are idiomatic of the eighteenth century tonal language. For example, if the subject is written in standard music notation, an "upside down" version of the subject is known as an inversion; that is, all intervallic patterns have been reversed (a minor third up becomes a minor third down, a perfect fifth up becomes a perfect fifth down, etc.). A version of the subject which contains the same intervallic sequence which has been rhythmically stretched out in time is called an augmentation. Similarly, a version of the subject which contains the same intervallic sequence but has been rhythmically shortened in time is called a diminution. In general, augmentation and diminution most commonly increase and decrease the length of a subject by a factor of two, respectively. It is also common to find combinations of these techniques in a single fragment of fugal music. For example, an augmented inversion refers to a subject which has first been flipped upside down, and then rhythmically stretched out by a factor of two.

Here is a concrete example of a subject and an augmentation of a subject. The soprano introduces the subject, while the bass enters an eighth rest later with an augmented version of the subject:



J.S. Bach, Vom Himmel Hoch, Variation No. 4 (manuals only)

(Benjamin 134)

Figure 1.5

For pedagogical purposes, standard wavelet analysis is very similar to standard fugue analysis if we limit ourselves to certain types of fugues. Specifically, let us consider fugues in which augmentation occurs more frequently in the lower register than the higher register, and in which diminution occurs more frequently in the higher register then the lower register. In the fugues in which we are interested, inversion may occur in any register. For example, consider the following excerpt, in which the subject introduced by the right hand is answered with an augmented inversion in the left hand:

J.S. Bach, The Art of Fugue, Canon No. 1

(Benjamin 135)

Figure 1.6

Recall that in wavelet analysis, all basis functions associated with the rectangles of a certain the analysis grid (e.g. Figure 1.3 and Figure 1.4) are dyadic (powers of 2) scales and translates of the same general shape, or mother wavelet. In fugue analysis, the analog of the mother wavelet is the subject. The subject has the same general shape everywhere, only it has been shifted in time in some cases to create polyphony with other voices. Furthermore, the subject has been altered in length by factors of 2 via diminution and augmentation; as mentioned above, in the fugues we are considering, longer (augmented) versions of the subject are found in the lower voices, shorter (diminutive) versions of the subject are found in the higher voices. This perfectly matches the "scaling" property of wavelets in the time domain described above.

This parallel extends to the divisions of wavelet analysis' time-frequency plane described above, as well. Standard musical notation is in fact a time-frequency plane in which the pitch scale is logarithmic - the frequency of A2 (110 Hz) is exactly one half the frequency of A3 (220 Hz), which is itself exactly one half of the frequency of A4 (440 Hz). Therefore, for the fugues we are considering, the frequency bandwidth of a subject in a lower register is smaller relative to the frequency bandwidth of a subject at a higher register, but the time span associated with the subject in a lower register is longer than the time span associated with the subject in a higher register. If we equate register to an entire level wavelet coefficients, then this is exactly the non-uniform division of the time-frequency plane that is depicted in Figure 1.3 and Figure 1.4.

In Figure 1.7, a portion of the wavelet time-frequency plane has been superimposed onto a musical score of the beginning of a fugue to highlight the similarities between fugue and wavelet analysis. The largest interval in the first measure of the subject in the left hand is a fifth, from D2 up to A3. The frequency differential of this interval is roughly proportional to the frequency bandwidth of the subject. In equal temperament, this is roughly equal to 220 Hz (A) - 147 Hz(D) = 73 Hz. The largest interval in the first measure of the subject in the right hand also covers a fifth, this time from A4 down to D3. The frequency differential of this interval is roughly proportional to the frequency bandwidth of the subject as well, except in a higher register.

In equal temperament, this is roughly equal to 880 Hz (A) - 587 (D) = 303 Hz, a differential that is much larger than the differential in the lower register. The point to observe is that even though the subject has the same overall intervallic contour and shape in both high and low registers, a faster, higher-register version of the subject covers a larger range of frequencies, while a slower, lower-register version of the subject covers a smaller range of frequencies. Finally, just as in wavelet analysis, the overall sound at any given time is the sum of what is occurring in each register (in this case, each staff) at that given time:

Comparison of Wavelet Analysis to Augmentation in a Lower Register.

A portion of the standard wavelet analysis grid is overlaid on top of standard musical notation. The time-frequency windows outlined by the grid coincide with the time-frequency representation given by the notated music.

J.S. Bach, The Art of Fugue, Fugue No. 6


(music from Benjamin 136)

Figure 1.7

Similarly, Figure 1.8 and Figure 1.9 comprise a non-fugal, musical analogy in which the inverse time-frequency relationship is made explicit. Note the parallel between musical dynamics and signal power for each level of wavelet coefficients in the analysis grid:


Figure 1.8



Figure 1.9

Although much of a fugue's musical material can be stated as augmentations and diminutions of a subject, there is also much music in a fugue which is not a exact augmentation or diminution of the subject. For example, in Figure 1.5, the middle voice, although beginning with a version of the subject, quickly turns accompanimental in function and ceases to imitate the subject in a direct way. Clearly, it would be difficult to express this accompanimental material as only a combination of expanded and/or shrunken versions of the subject. Similarly, any music student who has tried to analyze a piece by hierarchical and symmetrical Schenkerian analysis has undoubtedly run into problems. Therefore, if we go ahead and rigidly try to analyze a fugue with our best guess as to what the exact subject is, there are some notes which we simply cannot explain. Obviously, our analyzing subject must be "custom-fit" to the task; that is, it should try to "explain" as many notes as possible.

In wavelet analysis, there is a similar idea: a mother wavelet should be "custom-fit" to the analyzing task at hand; it should explain as many samples as possible in terms of scales and translates of the mother wavelet. In general, the fewer unexplained samples there are after the analysis, the better. In wavelet analysis, the degree to which a mother wavelet can resolve an input is related to how smooth the signal is, or in general how many non-zero derivatives the signal has. A mother wavelet which can "cleanly" analyze a smooth signal which has a high number of non-zero derivatives is said to have a high number of vanishing moments. Conversely, a mother wavelet which can "cleanly" analyze a smooth signal which has a low number of non-zero derivatives is said to have a low number of vanishing moments. In general, wavelet analysis provides a "clean" analysis for piecewise-polynomial signals. A higher number of vanishing moments means that a higher order polynomial can be exactly represented by a wavelet analysis.

The analysis above, done for discrete notes in time, can be extended to discrete samples of sound in time, of which digital audio sound files are constructed. An analyzing "mother wavelet" is selected, and then a window of input samples is analyzed to determine what scales and translates of the mother wavelet contribute to the signal at any given time. A number is assigned to how much a given scale and/or translate of the "mother wavelet" contributes to the input signal at any time; this number is called the "wavelet coefficient," and is depicted in Figures 1.3, 1.4, and 1.7 by shading and cross-hatching.

III. Analogy of wavelet analysis to storywriting

The analogy of wavelet analysis to fugue analysis is not perfect or complete by any means. First, there are many fugues in which augmentation occurs in higher registers, just as there are also many fugues in which diminution occurs in lower registers. Furthermore, not all permutations of a musical subject, such as augmentation, diminution, and inversion, can be given exact counterparts in wavelet analysis. Therefore, to illustrate the final important conceptual point of wavelet theory, let us turn to another analogy.

Suppose a storywriter has two versions of a sentence which he wishes to relate to one another:

(1.1) The quick brown fox jumped over the lazy dog.

(1.2) The quick purple fox jumped over the lazy dog.

What would be the easiest way to relate the two sentences to each other? One efficient solution is to say that they are the same sentence, except for one word: "purple." That is, if one sentence is "subtracted" from the other, there is a difference of just one word. Now, suppose another comparison of two sentences is made:

(1.3) The quick brown fox jumped over the lazy dog.

(1.4) The quick green fox jumped over the lazy dog.

The "difference" between these two sentences is now the word "green." Now, suppose we construct a simple story, although somewhat trivial, with sentences (1.1), (1.2), (1.3), and (1.4) by just joining them together into a paragraph. In this sense, sentences (1.1), (1.2), (1.3), and (1.4) have become the "building blocks," or basis functions, of the story we want to write:




The quick brown fox jumped over the lazy dog. The quick green fox jumped over the lazy dog. The quick green fox jumped over the lazy dog. The quick purple fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick purple fox jumped over the lazy dog.

Figure 1.10

However, writing a story in which every sentence is almost the same is not only boring, but also inefficient. If we wanted to store the text of such a story in disk, for example, we would be wasting a lot of disk space with repeated material. Instead, we would rather store the following, more efficient data structure:

a) A "common denominator" or "overall average" of the sentences. For the example in Figure 1.10 this would be:

"The quick ____ fox jumped over the lazy dog"

b) A list of "differences," that is, what words fill in the blank in the "overall average." For the example in Figure 1.10 this would be:

purple, green

c) An ordered list of how these differences occur in the story, when inserted into the "overall average" in a). For the example in Figure 1.10 this would be:

brown, green, green, purple, brown, purple

Note that we can construct the story in Figure 1.10 by using either the large, full sentences (1.1), (1.2), (1.3), and (1.4); or, we could use the more compact scheme sketched in a), b), and c), above. The interesting point of this discussion is that the second, more compact set of basis functions can be derived from the first set of basis functions (sentences (1.1), (1.2), (1.3), and (1.4)) in a prescribed way - subtraction.

In the mathematical wavelet literature, the "large" basis functions are typically called scaling functions, while the "smaller" basis functions, or the basis functions created by taking differences of the scaling functions, are typically called wavelet functions. Here is the important result: wavelets are the study of difference spaces. Specifically, wavelet basis functions categorize the differences between different scaling functions. For example, in Figure 1.10, all of the difference words are adjectives. Therefore, the wavelet basis function that categorizes all possible differences among the scaling functions (sentences (1.1), (1.2), (1.3), and (1.4)) is the adjective. However, it is important to remember that this analogy, too, is incomplete. It did not have the scaling property, or time-frequency properties that were evident in the fugue analogy.

IV. Summary of wavelet theory

The following defining properties of wavelet analysis have been presented:

-- Wavelet analysis can be an analysis-resynthesis method.

-- A wavelet representation is different from Fourier analysis. A wavelet representation is a non-uniform division of the time-frequency plane in which varying bandwidths of frequency information have inversely proportional associated time support. The standard wavelet basis associates higher bandwidths with higher center frequencies and shorter time windows.

-- In general, wavelet analysis is well-suited to isolating sharp transients in a signal, a task at which Fourier analysis is not so well suited.

-- Each rectangle in a standard graphical wavelet representation corresponds with a wavelet basis function and has an associated wavelet coefficient.

-- Each wavelet basis function is a translation and/or scaling of a single general shape, called the mother wavelet. The translation and scaling factors associated with these basis functions are all powers of two in the standard analysis scheme.

-- The degree to which a wavelet can cleanly analyze a given input signal is related to the number of vanishing moments it has. The higher the number of vanishing moments a wavelet has, the higher-order polynomial an analysis done with that wavelet can exactly represent.

-- Unlike Fourier analysis, there may be many different shapes for mother wavelets, and one mother wavelet is usually selected for a certain analysis because of its desired properties (e.g. vanishing moments, short time domain support, smoothness, etc.).

-- The degree of shading of each rectangle in a standard graphical wavelet representation is proportional to how much of the rectangle's corresponding basis function is present in the signal at the rectangle's associated time-frequency period. Numerical values can be given to each rectangle's degree of shading. These values are called wavelet coefficients.

-- Wavelets are a study of difference spaces. They characterize how "larger" basis functions, called scaling functions, differ from one another by categorizing the differences between the scaling functions. Each category of difference between the scaling functions is associated with a wavelet basis function. Successive levels of wavelet coefficients represent increasingly detailed approximations to the analyzed signal.

V. Previous work in wavelets as applied to audio and electro-acoustic music

Although the theory of wavelets and their implementations are still under development, many applications for wavelets in audio, music, and other disciplines have already been found. Even though the fugue example above was intended only for analogy, for example, wavelets have already been used to analyze self-similarity in pitch contour (Kussmaul "Pitch Contour"). Since wavelets are the study of difference spaces, the associated wavelet domain wavelet coefficients are small relative to the original signal. Many researchers have used this property to develop compression techniques for audio and video (Barnell, Kudumakis, Scholl, Sinha). Since wavelet processing can be done with a recursive set of filter banks (see Chapter 2), there is also much research on efficient algorithms for the forward and inverse wavelet transform. Theoretically, the wavelet transform is faster than the Fast Fourier transform, as it can be implemented in O(n) time (Beylkin, Brown, Lang, Unser 1994, Unser 1995). For music processing, more efficient algorithms could mean a step closer to real-time processing of audio signals in software and hardware. Much work has also been done with wavelets and pitch detection (Ellis, Kadambe, Popovic, Shelby, Yup). Since wavelets can separate sharp transients from a source signal, wavelet-based pitch detection schemes are theoretically more tolerant to an input signal's noise.

Conversely, another common application of wavelets is for denoising purposes (Berger and Nichols, Berger and Coifman). Since relatively large wavelet coefficients are concentrated around important pitched and transient information of a signal, coefficients below a certain threshold can be treated as "noise" and removed from the signal's wavelet domain representation. Another interesting use of wavelets of direct importance to the electro-acoustic musician is for the extraction of frequency modulation laws (Delprat). Frequency modulation (FM), a standard audio synthesis technique pioneered by Chowning, has been used to simulate several different instruments. However, choosing the parameters for frequency modulation (carrier frequency, modulating frequency, index of modulation) that mimic a certain instrument is often difficult. Delprat's suggestion is to use a wavelet representation of a sampled sound to "extract" the frequency modulation parameters associated with the sampled sound, so that an instrument designer does not have to guess the FM parameters directly.

Wavelets have also recently been used for sound localization purposes, a common compositional technique in electro-acoustic music (Weiss 1994, Weiss et al.). Finally, wavelets are being used to do different forms of wideband additive synthesis, where the individual synthesis functions involved have a wider frequency bandwidth than just a single frequency (Arfib, "Resynthesis and Transformations of Sounds," Arfib "Musical Transformations," Guillemain, Kronland-Martinet).

In addition, several wavelet software packages have been made available on the internet which provide tools for wavelet analysis and resynthesis, including the Xwpl package from Yale University and the Wavelab package for the MATLAB environment, available from Stanford University.


Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I