Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I


Chapter 4

Using the exponential-decay property of wavelet coefficients to spectrally enhance low-sample-rate audio files:

A wavelet-based excitation algorithm


I. Introduction

A. Motivation

Although the high frequency range of a typical audio recording has little power compared with its lower frequencies, the presence of high frequencies is very important to the overall recorded sound in several ways. Not only is a certain "brightness" attributed to frequencies above 10 kHz, for example, but much localization information and ambient information is contained in the high frequency regions above 10 kHz.

In any pragmatic physical system, however, high frequencies are usually quickly attenuated from a source due to dissipative forces such as friction and heat. Recorded sound, which usually passes through several physical systems (transducers) on its way from live performance or original realization to a listener's ear, suffers from these same dissipative forces. In fact, much high frequency content is still lost in many of the best audio recording and reproduction systems available today. For example, early compact disc recorders and players introduced the notion of "emphasis"; which in essence is a high frequency-band equalizer that compensates for high frequency loss on recording and playback. Earlier analog tape systems also pre-processed sound in the same way by boosting high-frequency tape content to produce a "brighter" sound on playback.

Although digital recording and processing systems try to alleviate this problem by providing a no-loss recording and playback system, the increasing need for cross-platform networking for musical purposes, along with increasingly popular INTERNET-based music disseminating mechanisms has posed a new set of problems related to low sample rates, which also affect the high-frequency content of recorded sounds.

Because CD-quality audio is expensive in terms of storage space and network-transfer time, many commonly available audio files on the INTERNET, for example, do not use the full, CD-quality audio sampling rate of 44.1kHz. Instead, most files available on the network use a lower sampling rate, such as 8kHz, 11.025 kHz, 16 kHz, or 22.050 Hz. While this dramatically cuts down on storage space and transfer time, it also dramatically reduces the frequency content, for a digital signal with a certain sampling rate, sr Hz, can only capture frequencies from zero to sr/2 Hz. In other words, a sound file with a low sampling rate can capture and reproduce a lower range of frequencies than a sound file with a higher sampling rate.

The many EQ-based mechanisms that have been introduced to accomplish high-frequency compensation/boosting are called exciters in the audio industry, and generally work by emphasizing existing but attenuated high-frequency content of a pre-recorded signal. However, if we wish to excite digital signals with sampling rates lower than 44.1kHz, we face a more acute problem: the high frequencies for which we wish to compensate are not even present. Somehow, if we are to enhance the high frequency bands of a low-sample-rate audio file, we must actually guess the high frequency content from the existing, lower frequency content. For example, an 11.025 kHz file found on the network only has frequency content between zero and 5.5125 kHz. We wish to compensate for the frequencies from 5.125 kHz to 20kHz, so that we can approach a simulated CD-quality sound.

If this estimation at high frequencies is moderately successful, it could provide a means for faster transmission of audio; in essence, a new form of compression could be developed that effectively expands the frequency bandwidth of an audio file. Specifically, network distribution of audio could be accomplished with low-sample-rate files, non-critical audio storage systems could be developed that use this technique, etc.

B. Wavelet theory and high frequency compensation

The task of this chapter is to show how an analysis-synthesis wavelet technique can be used to spectrally enhance the high-frequency content of low-sample-rate audio files. A general outline of the algorithm is fairly simple: first, decompose the low-sample-rate audio file using the standard, dyadic, circular-convolution filterbank construction with a set of analysis/synthesis filters having a specific number of vanishing moments. Next, construct an empty level of wavelet coefficients (all coefficients set to zero) one level higher than the highest level of wavelet analysis of the soundfile. Based on the exponential decay property of the wavelet coefficients in the analysis, guess the values of these coefficients based on the values of the existing wavelet coefficients. Perform the inverse wavelet transform on the expanded block of wavelet coefficients. Let us take an in-depth look at this process.

1) perform the forward wavelet transform on a window of the low-sample-rate audio file. Let there be p vanishing moments in the analysis and synthesis filters.

One of the strengths of wavelet analysis as opposed to Fourier analysis is that wavelet coefficients are more concentrated in the time-frequency plane than are Fourier coefficients. In other words, Fourier coefficients tend to "die away" in the time-frequency plane more slowly from high initial values than wavelet coefficients do. Quantitatively, Strang has show that there is an exponentially-related law that roughly relates the decay of the wavelet coefficients at a certain analysis level to 1) the smoothness of the analyzed function and 2) the wavelet analysis level number:


(Strang 231)

If we make the reasonable assumption that the signal we are analyzing has approximately the same smoothness as the analyzing wavelet, then p in the (4.1) can be replaced by the number of vanishing moments in the analyzing and synthesizing wavelet. When we have completed our wavelet analysis of a window of samples from the low-sample-rate audio file, we can roughly expect to see an exponential decay of the wavelet coefficients across levels. There are exceptions, of course, but as a general rule for analyzing and synthesizing wavelets with low numbers of vanishing moments, this is a good estimate.


Figure 4.1

2) Add a new layer of wavelet coefficients to the existing wavelet analysis, one level above the most detailed level of analysis already existing.

There are several issues raised in this step. First, the new level of coefficients has the potential for carrying information with the same frequency bandwidth as all of the previous levels combined. This new level of coefficients has the potential to contain higher frequency data than is possessed by the existing levels of wavelet coefficients. Furthermore, the time resolution is twice as fine as the level directly before it. An inverse wavelet transform performed on the expanded window, with this added iteration, should therefore produce a soundfile that can have twice the bandwidth as the original file; we only have to guess at what the new level's coefficients' values should be. There is one complication, however: adding another level of wavelet coefficients to the existing wavelet coefficients, using the standard dyadic form of the wavelet transform, exactly doubles the amount of data that is contained in that window. In other words, the number of wavelet coefficients is doubled when the new, currently zero-valued wavelet coefficients are introduced. How do we reconcile this with the wavelet transform and sampling rates? The answer lies in the inverse wavelet transform's algorithm: we are doubling the amount of data, but all of the data supposedly represents higher frequency data. Therefore, we need more "headroom" in terms of the sampling rate - it must be altered to allow for the added high frequencies. Simply, we double the sampling rate to allow for twice the bandwidth of the original file, allowing for the added high frequencies.

3) Guess the values of the newly allocated coefficients, based on the exponential decay law of the wavelet coefficients.

The guessed coefficients' values are guessed from the existing ones. A guessed coefficient is equal to an exponentially decaying reliance on the coefficients that fall directly below it, one from each level of analysis (see Figure 4.2). For example, in a five-level analysis, the guessed coefficient is a sixth-level coefficient which is equal to:


In (4.2), the guessed wavelet coefficient on level 6 is most reliant on its closest neighbor, a fifth-level coefficient, but it is also reliant on all levels analysis in an exponentially decreasing manner. However, the guessed coefficient may rely on fewer levels of coefficients, which in practice leads to less aliasing effects in the output signal. In the cases where not all wavelet levels are used, the best results usually only use one level of wavelet coefficients, namely, the level directly below the guessed coefficients level, in guessing the unknown wavelet coefficient.

(4.2) can be generalized as follows, as the formula for an entire level of guessed wavelet coefficients. This formula further quantifies which wavelet coefficients on which previous levels (specifically, which translates on a certain level) are used to compute a specific, guessed wavelet coefficient:




Figure 4.2

3) Perform the inverse wavelet transform on the altered window of material.

The inverse transform is performed normally, with the same set of analysis/synthesis wavelet filter bank. The only problem encountered in this scheme is a loss of power in the resulting output signal - by adding another level of wavelet coefficients, the conservation of power property of the wavelet transform has been violated. In audio files, this is easily remedied by normalizing the output file.

II. Results and discussion

A. Results

Several audio files were processed using the previous technique using a specially designed processing program written in C++ (See Appendix E for the details of this program). All examples are listed below in III.

The results were good for drum sounds and high-pitched sounds, moderate for heavy percussion sounds, and relatively poor for speech sounds. This is consistent our expectations, since 1) wavelet analysis is well suited to signals with sharp transients and speech, in general, is not; 2) speech is mostly contained in low-frequency regions, and therefore enhancement is essentially adding high frequency material that is not present in the original signal. However, imaging was much improved in most cases, and ambiance was added in many cases as well.

In most cases as well, however, there was a grainy sound to the processed output, as the filter coefficients used did not have sharp rolloffs.

Voice tended to suffer the most during this enhancement, as much aliasing can be heard.

B. Discussion

Wavelets are a good choice for the type of guesswork involved in (4.3). Since high frequency information usually has relatively little power, the exponential decay property of wavelet coefficients does well in providing a reasonable guess of the magnitude of a guessed wavelet coefficient. The algorithm is simple because it is only concerned with one large band of frequency information at a time; in a Fourier version of this algorithm, many more bands of information would have to be guessed at, and more processing time would be needed to achieve a similar result.

III. Examples

The following table lists the examples that make use of the wavelet-based excitation effect. Original 44.1 kHz, stereo, 16-bit AIFF files were downsampled to 11.025 kHz, thus throwing away much high frequency information in the process. These "low resolution" files were then spectrally enhanced twice using the program described in Appendix E, thus quadrupling the original sample rate to 44.1 kHz. A window of 32768 frames was used for each example, and 2 iterations of the wavelet transform were performed for each low resolution file. All 11.025 kHz "low resolution" files were upsampled to 44.1 kHz for their inclusion on the CD; however, their low-resolution sound is clearly audible, as no significant audible changes were made in the upsampling. Each example, therefore, consists of an original file, a downsampled, 11.025 kHz file, and a spectrally enhanced, 44.1 kHz file. The guessed wavelet coefficients only relied on one level (N=1 in (4.3) ). The 6/10 binary-coefficient wavelet analysis/synthesis filters were used (p=3 in (4.3) ). The low resolution files and the spectrally enhanced files have been

normalized so that comparisons can be made between them.



CD Track #Audio example Artist / title of excerptversion
184.1Peter Gabriel / In Your Eyes original 44.1kHz file
194.2 normalized 11.025 kHz file
204.3 normalized, spectrally enhanced 44.1kHz file
214.4Disney, Little Mermaid /

Kiss the Girl

original 44.1kHz file
224.5 normalized 11.025 kHz file
234.6 normalized, spectrally enhanced 44.1kHz file
244.7Peter Gabriel / Red Rain original 44.1kHz file
254.8 normalized 11.025 kHz file
264.9 normalized, spectrally enhanced 44.1kHz file
274.10J.C. Risset / Inharmonique original 44.1kHz file
284.11 normalized 11.025 kHz file
294.12 normalized, spectrally enhanced 44.1kHz file
304.13Murder, Inc. / Mania original 44.1kHz file
314.14 normalized 11.025 kHz file
324.15 normalized, spectrally enhanced 44.1kHz file
334.16Killing Joke / White Out original 44.1kHz file
344.17 normalized 11.025 kHz file
354.18 normalized, spectrally enhanced 44.1kHz file
364.19Murder, Inc. /

Mr. Whiskey's Name

original 44.1kHz file
374.20 normalized 11.025 kHz file
384.21 normalized, spectrally enhanced 44.1kHz file
394.22Corey Cheng / Woods original 44.1kHz file
404.23 normalized 11.025 kHz file
414.24 normalized, spectrally enhanced 44.1kHz file

Table 4.1



IV. Conclusions and future directions

The results from this simple wavelet-based process are promising; frequency information has been added to low-sample-rate soundfiles that sound fairly convincing in many cases. However, the algorithm presented here is a first step which requires refinement in several ways: the power attenuation problem must be fixed; existing coefficients could also be altered to provide for a better result, and different, higher-order filters could be used.


Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I