Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I


Chapter 3

Second generation wavelet shrinkage: a comparison of the lifting scheme and circular convolution in the denoising of digital audio



One of the pioneering applications of wavelets was their use in the denoising of digital images. This process can be easily extended to digital audio. This chapter compares two sets of wavelets in the denoising of audio, and gives audio examples of each.

I. Denoising audio using wavelets

The general idea behind the denoising scheme is to divide the wavelet coefficients of forward-transformed data into two categories: large-magnitude wavelet coefficients and small-magnitude coefficients. The pivotal, numerical value at which this separation is made is called the threshold value. It is assumed that a large-magnitude wavelet coefficient represents an important feature of the signal, and generally should not be altered, while a small-magnitude wavelet coefficient represents a noisy portion of the signal, which should be attenuated or eliminated from the signal before reconstruction (Strang 387).

There are two variations of denoising using wavelets: hard thresholding and soft thresholding. The hard thresholding scheme sets all wavelet coefficients below a certain magnitude to zero, leaving the wavelet coefficients larger than the threshold value unchanged. The soft thresholding scheme sets all wavelet coefficients below a certain magnitude to zero, and also subtracts the threshold value from the wavelet coefficients which are larger than the threshold value. In this manner, a smoother transition is made between zero and non-zero wavelet coefficients. Soft thresholding, which is preferable to hard thresholding, is used for the denoising procedures in this chapter.


The choice of the threshold is important in estimating which parts of a signal are noisy and which parts of a signal are important. This value may change dynamically, and algorithms that compute this value based on various input parameters have already been developed (Strang 388). Perhaps the most successful wavelet-based noise reduction schemes are the even more complex wavelet-packed based algorithms developed at Yale University (Berger and Nichols). However, in the current experiment, the threshold values for the examples below were chosen empirically, based on the best-sounding result. The threshold values were held constant for all levels of wavelet coefficients during a single denoising process. This simple denoising procedure was selected so that two sets of wavelets could be compared in a straightforward manner.

Similar procedures exist for Fourier representations of a signal: a certain number of frequency bands are chosen for the analysis, and certain bands are attenuated at certain times depending on their magnitude and immediate context. Although the procedures and algorithms are similar for Fourier and wavelet processing, wavelet-based denoising tends to preserve more higher frequencies associated with sharp transients that are usually lost in simpler equalization schemes. Since the wavelet representation of a signal isolates transient information better than the Fourier transform, musical events such as sharp attacks are better preserved in a wavelet-based denoising process.

Figure 3.1 shows the denoising process graphically for one window of samples:

Figure 3.1


II. The theory of lifting, second generation wavelets, interpolating wavelets.

Because the discrete wavelet transform deals with a finite number of input samples, a long audio signal must be processed in blocks, or windows of samples. However, because the forward and inverse wavelet transforms are usually computed with the filterbank structure outlined in Figure 2.1, each window of filtered sound data suffers from boundary artifacts, a common problem associated with digital filters operating on a finite data set. Specifically, time-domain convolution, which is often used to implement digital filters, can produce undesirable, audible artifacts (often heard as clicks) at the boundary between each window of processed sound. There have been numerous attempts at minimizing this boundary effect - symmetric extension, zero-padding, and circular convolution are perhaps the most popular solutions.

The concept of lifting was introduced by Wim Sweldens to alleviate this and other problems associated with a traditional digital filterbank scheme. In this scheme, the general layout of the wavelet analysis grid outlined in Figure 3.1 does not change; however, the basis functions associated with each wavelet coefficient are not necessarily all dilates and translates of a single, mother wavelet shape. Likewise, the frequency characteristics may differ somewhat from those of an analysis done with digital filters employing circular convolution. Nonetheless, the flexibility afforded by the lifting scheme allows the basis functions associated with wavelet coefficients near a window's boundaries to change their general shape at the boundaries. In this manner, a basis function more accommodating to a boundary can be used to minimize boundary effects. Sweldens has named the wavelets generated by the lifting scheme as second generation wavelets (Sweldens "Second generation wavelets"), and has proposed applications of such wavelets on irregularly sampled data, texture processing, and boundary-value problems.

Because windowing is a central issue in the processing of large data sets such as audio files, the task of the present chapter is to compare wavelet-based denoising of audio with two sets of forward and inverse wavelet transforms: 1) the standard, digital filter filterbank scheme, which employs circular convolution to deal with window boundaries, and 2) the interpolated, lifting scheme, which uses polynomial interpolation to accomplish the forward and inverse lifted wavelet transforms. In this manner, we hope to determine if the interpolating, lifting scheme alleviates some of the boundary artifacts created with the circular convolution scheme.

A. The lifting scheme

In the standard, dyadic multiresolution filterbank scheme, four digital filters are needed to compute the forward and inverse wavelet transforms. While the digital filters do not change during the computation of either transform, the computation of the filter coefficients themselves is not a trivial task. The mathematical constraints discussed in Chapter 2 must be observed, and a solution to the dilation equation must be found which also incorporates other restrictions, such as general shape, number of vanishing moments, and filter length.

The lifting scheme takes a different approach to the construction of these filters, and does not necessarily require the direct solution of the dilation equation. In principle, the lifting scheme builds complex filters from simple, even trivial filters that already satisfy the defining properties of a multiresolution analysis filterbank.

Specifically, the primary lifting scheme uses a set of existing primary wavelet functions and dual scaling functions to create new, more complex primary wavelet functions and dual scaling functions. Consequently, the analyzing low pass filter and the synthesizing high pass filter are altered as well. The analyzing high pass filter and the synthesizing low pass filter, along with the primary scaling functions and the dual wavelet functions, are left unaltered:



Please note the three indices involved with the filters above. Implicit in these three indices is the notion that the s filter may change for different scales and translates of a wavelet analysis. The first index refers to the level of a wavelet analysis, the second index refers to the specific translate of the level of analysis specified in the first index, and the third index references the individual filter coefficients for that particular scale and translate of the analysis.

By similar arguments, a dual lifting scheme can be constructed that uses a set of existing dual wavelet functions and primary scaling functions to create new, more complex dual wavelet functions and primary scaling functions. Consequently, in the dual lifting scheme, the synthesizing low pass filter and the analyzing high pass filter are altered as well. The synthesizing high pass filter and the analyzing low pass filter, along with the dual scaling functions and the primary wavelet functions, are left unaltered.

The strength of the lifting and dual lifting schemes is the introduction of the s filter for a primary lifting, and the filter for a dual lifting. The coefficients in both the s and the filters can be chosen freely; therefore, they can be used to improve the characteristics of an existing analysis/synthesis filterbank by introducing more vanishing moments, altering the shape of the mother wavelet, etc. Since primary lifting and dual lifting affect complementary pairs of filters, execution of the lifting scheme can be followed by a dual lifting, or vice-versa, resulting in a filterbank in which all four filters have been altered in an advantageous way.

In practice, the forward and inverse lifted wavelet transform can be implemented using techniques similar to the original filterbank scheme introduced in Figure 2.1. The forward transform algorithm computes the scaling and wavelet coefficients from the old filters first, and then applies the new filters to the already computed coefficients to update, or "lift" the old coefficients to their new values:



The inverse transform works in a similar manner. First, the "lifting" done by (3.11) is undone, and then the wavelet coefficients and the "un-lifted" scaling coefficients are processed by the old filters. The sum is then sent as the input to the next stage of the inverse transform:



B. Lifted, interpolating wavelets

The power of the lifting mechanism comes from the flexibility of the s and filters. Both filters can alter the analyzing and synthesizing wavelet and scaling functions to custom-design a set of filters to a certain task. Given a trivial set of starting filters, then, the question becomes a matter of computing the desired s and/or filter coefficients.

One interesting application of the s and filters is their use in the theory and computation of interpolating wavelets. In this mechanism, the standard filterbank scheme in Figure 2.1 is not used directly; instead, only downsampling and polynomial interpolation of samples are used. An advantage of this scheme is that polynomial interpolation using higher order polynomial approximations can lead to wavelet analyses with arbitrarily high (2n) vanishing moments.

First, one must chose the order of polynomial to use in the interpolation scheme. Let this value be d. In general, for dth order polynomial interpolation, 2d data points are needed. For example, for linear interpolation (d=1), 2 data points are needed; for quadratic interpolation (d=2), 4 data points are needed, etc. There will be 2d vanishing moments associated with the analysis.

Computation of the forward transform begins by using 2d sample points around a central axis to compute the value of a best-fit, interpolating polynomial of order d at the central axis. The 2d values used are each 2 samples away from each other in this dyadic analysis, which closely mimics the downsampling operators in the filterbank scheme of Figure 2.1. The interpolated value is then compared to the actual, existing sample value at the central axis. The difference is taken between the interpolated and actual values, and is called the wavelet coefficient at that translation (central axis) value. The scaling coefficient at that translation (central axis) is computed in a similar manner, by interpolating between existing scaling coefficients around a central axis. This procedure is repeated for the specified number of iterations. The inverse transform can be accomplished in a similar manner, by adding the difference of interpolated and actual values (wavelet coefficients) to a re-computed, interpolated value. Formal arguments have shown that such an analysis does indeed produce a multiresolution analysis satisfying all of the mathematical constraints outlined in Chapter 2, including scale and shift invariance (Sweldens and Schröder).

In fact, such an analysis has links to the lifting scheme decried above in II.A. Polynomial interpolation used in the above manner is the dual lifting of the lazy wavelet filterbank. The lazy wavelet filterbank is simply a two-channel filterbank which separates the input into even and odd samples. This conclusion is important: we can explain the above interpolation scheme using the language of lifting, developed above in II.A. The only question that remains is to find how the s and filters are computed to achieve such an interpolation-based result.

A common algorithm that is used to do polynomial interpolation and extrapolation is Neville's algorithm. The use of Neville's algorithm in the lifted, interpolated forward and inverse transform for both the forward and inverse lifting arises from the linear proportionality of the primary s filter coefficients to the filter coefficients (Sweldens, "A Construction of second generation wavelets," 18). In essence, Neville's algorithm not only chooses the s and filters for us, it also performs the filtering process in (3.9) - (3.13) without the user needing to know the s and filter coefficients exactly.

At the boundary, we can still use Neville's algorithm to evaluate the wavelet and scaling coefficients, even though interpolation is done in a slightly different way. Instead of choosing a central axis about which to choose 2d data points as interpolation input parameters, we select an off-center axis about which to choose these points, based on where the boundary occurs. For example, if we want to use linear interpolation somewhere in the middle of a data window, we simply choose one sample on either side of the desired point for which we desired an interpolated value; the boundary has no effect on this computation. However, if we wish to use linear interpolation to guess a value at the left boundary of a window, we use two points on the right of the boundary. Linear extrapolation is now used, although Neville's algorithm can be used to compute both interpolated and extrapolated values. In this manner, we can still use polynomial interpolation at the boundaries, and still achieve 2d vanishing moments at the boundaries. The advantage of using an interpolating polynomial of degree d at the boundaries is that 2d vanishing moments are still present in the analysis right up to the sample at the boundary. Because of this uniformity of analysis throughout the window, we hope that boundary artifacts are lessened by this use of interpolation and lifting.

III. Results

A. Denoising using binary wavelets

An audio file of a male speaker was corrupted (mixed) with white noise to form the test signal. As a control, a denoising program was written that made use of the 5/3 binary coefficient filterbank described in Table 2.1 and Table 2.2. The soundfile is broken down into windows of samples, and the wavelet transform is applied directly to each window of data, using circular convolution techniques to deal with the boundaries of each window. Several denoising attempts were made with varying numbers of iterations and thresholds.

The results produce a somewhat harsh sound for many of the trials, although a considerable amount of noise is removed from the test signal in many cases. An increased number of iterations seems to lower the remaining noise's ceiling frequency. This is reasonable, since an increased number of iterations provides more information on the lower frequencies of a signal, and allows denoising to occur at these lower frequencies with a greater frequency resolution. Increasing the threshold to about 200 seemed to provide the best result, while a threshold of 5000 was too high for the denoising scheme, and produced a gating effect through which only fricative speech sounds are heard.
CD Track # Audio exampleArtist window sizeiterations threshold
13.1 Martin Dupras originalnoisy sample
23.2 Martin Dupras 327682 100
33.3 Martin Dupras 327684 100
43.4 Martin Dupras 327688 100
53.5 Martin Dupras 3276812 30
63.6 Martin Dupras 3276812 70
73.7 Martin Dupras 3276812 200
83.8 Martin Dupras 3276812 5000

Table 3.1


B. Denoising using lifted, interpolating wavelets

The same test file was used as the source for tests with lifted, interpolating wavelet-based denoising. A program was written which employs the lifted, interpolating transform outlined in II.A and II.B above. Several denoising attempts were made with varying numbers of iterations, thresholds, and vanishing moments.

Varying the number of iterations and threshold parameters had a similar effect with the lifting scheme as varying the same parameters in the circular convolution scheme. Increasing the number of vanishing moments in this scheme had little effect; in fact, higher vanishing moments in audio examples 3.15 and 3.16 produced more audible boundary artifacts.

Note that the number of vanishing moments in a denoising experiment is equal to the order of the interpolating polynomial used in that particular experiment.



CD Track # Audio exampleArtist window sizeiterations vanishing index vanishing moments = (2 * vanishing index) - 1 threshold
93.9 Martin Dupras 327682 23 100
103.10 Martin Dupras 327684 23 100
113.11 Martin Dupras 327688 23 100
123.12 Martin Dupras 3276812 23 30
133.13 Martin Dupras 3276812 23 70
143.14 Martin Dupras 3276812 23 200
153.15 Martin Dupras 3276812 23 5000
163.16 Martin Dupras 3276812 35 100
173.17 Martin Dupras 3276812 47 100

Table 3.2

IV. Conclusions

The circular convolution denoising scheme is the easiest to implement and understand. It uses techniques which are used in many digital filters, although some audible clicks can be heard in the denoised output at the window boundaries. Conversely,

the lifting scheme is a more powerful and flexible technique; however, the audible output for similar input parameters is not as good in this implementation. In fact, the boundary artifacts are made more severe when a high number of vanishing moments are used.

The lifting scheme with interpolated wavelets is a powerful technique for wavelet analysis: it allows the custom design of biorthogonal, symmetric wavelets with an arbitrary (2n) number of vanishing moments that preserve these vanishing moment properties even at the boundaries of an data window. However, despite these advantages, when the interpolation at the boundaries does not produce a wavelet coefficient with zero value, the resulting boundary wavelet coefficient tends to have a very large magnitude due to the extreme sensitivity of polynomial extrapolation at the boundary of the data set.

The wavelet coefficients produced at the boundary, in this manner, tend to have little value, and therefore not only do not solve the problem of wavelet analysis at the boundary, but even tend to exaggerate the problem. In other words, when the polynomial approximation in the lifting scheme works exactly, it works well; however, when the approximation is off as is, by far, most often the case, it tends to be off by large amounts (a few orders of magnitude is an empirical best guess). Artifacts can be heard as more noticeable audible clicks at the boundary.

Fortunately, the broad-band denoising scheme mostly alters wavelet coefficients with small values, so that these large values are ignored in the denoising process. The perfect reconstruction property of the lifted wavelet transform just transforms these large-magnitude wavelet coefficients back into the original audio, so the listener is unaware of boundary effects due to extremely large wavelet coefficients. However, these large, intermediate wavelet coefficients at the boundaries can sometimes be heard as clicks when the user requests both a primary/dual wavelet with a high number of vanishing moments and a fairly high number of iterations.

Since the standard wavelet decomposition scheme was used in this noise reduction (recursive splitting of the low-pass output), it is no wonder that there are also low-frequency noise artifacts left in the denoised output. Soft thresholding only eliminates wavelet coefficients below the threshold value and adjusts others to minimize discontinuities in the final reconstruction. However, since the wavelet decomposition must stop after only a few iterations to prevent the surfacing of boundary artifacts, not all low-frequency noise can be removed. A best basis algorithm, as proposed and implemented by Coifman and Wickerhauser, or another non-standard division of the frequency space would do better in eliminating these low-frequency artifacts. Dynamic computation of the threshold value is also another possible improvement to this process.


Go to:
Title Page               Chapter 1           Appendix A   Appendix J
Copyright                Chapter 2           Appendix B   Appendix K
Abstract                 Chapter 3           Appendix C   Appendix L
Acknowledgements         Chapter 4           Appendix D   References
Table of Contents        Chapter 5           Appendix E
List of Figure           Conclusions/        Appendix F
List of Tables           Future Directions   Appendix G
List of Audio Examples                       Appendix H
List of Programs                             Appendix I