Go to: Title Page Chapter 1 Appendix A Appendix J Copyright Chapter 2 Appendix B Appendix K Abstract Chapter 3 Appendix C Appendix L Acknowledgements Chapter 4 Appendix D References Table of Contents Chapter 5 Appendix E List of Figure Conclusions/ Appendix F List of Tables Future Directions Appendix G List of Audio Examples Appendix H List of Programs Appendix I
One of the pioneering applications of wavelets
was their use in the denoising of digital images. This process
can be easily extended to digital audio. This chapter compares
two sets of wavelets in the denoising of audio, and gives audio
examples of each.
I. Denoising audio using wavelets
The general idea behind the denoising scheme is to divide the wavelet coefficients of forward-transformed data into two categories: large-magnitude wavelet coefficients and small-magnitude coefficients. The pivotal, numerical value at which this separation is made is called the threshold value. It is assumed that a large-magnitude wavelet coefficient represents an important feature of the signal, and generally should not be altered, while a small-magnitude wavelet coefficient represents a noisy portion of the signal, which should be attenuated or eliminated from the signal before reconstruction (Strang 387).
There are two variations of denoising using
wavelets: hard thresholding and soft thresholding. The hard thresholding
scheme sets all wavelet coefficients below a certain magnitude
to zero, leaving the wavelet coefficients larger than the threshold
value unchanged. The soft thresholding scheme sets all wavelet
coefficients below a certain magnitude to zero, and also subtracts
the threshold value from the wavelet coefficients which are larger
than the threshold value. In this manner, a smoother transition
is made between zero and non-zero wavelet coefficients. Soft thresholding,
which is preferable to hard thresholding, is used for the denoising
procedures in this chapter.
The choice of the threshold is important in estimating which parts of a signal are noisy and which parts of a signal are important. This value may change dynamically, and algorithms that compute this value based on various input parameters have already been developed (Strang 388). Perhaps the most successful wavelet-based noise reduction schemes are the even more complex wavelet-packed based algorithms developed at Yale University (Berger and Nichols). However, in the current experiment, the threshold values for the examples below were chosen empirically, based on the best-sounding result. The threshold values were held constant for all levels of wavelet coefficients during a single denoising process. This simple denoising procedure was selected so that two sets of wavelets could be compared in a straightforward manner.
Similar procedures exist for Fourier representations of a signal: a certain number of frequency bands are chosen for the analysis, and certain bands are attenuated at certain times depending on their magnitude and immediate context. Although the procedures and algorithms are similar for Fourier and wavelet processing, wavelet-based denoising tends to preserve more higher frequencies associated with sharp transients that are usually lost in simpler equalization schemes. Since the wavelet representation of a signal isolates transient information better than the Fourier transform, musical events such as sharp attacks are better preserved in a wavelet-based denoising process.
Figure 3.1 shows the denoising process graphically for one window of samples:
II. The theory of lifting, second generation wavelets, interpolating wavelets.
Because the discrete wavelet transform deals with a finite number of input samples, a long audio signal must be processed in blocks, or windows of samples. However, because the forward and inverse wavelet transforms are usually computed with the filterbank structure outlined in Figure 2.1, each window of filtered sound data suffers from boundary artifacts, a common problem associated with digital filters operating on a finite data set. Specifically, time-domain convolution, which is often used to implement digital filters, can produce undesirable, audible artifacts (often heard as clicks) at the boundary between each window of processed sound. There have been numerous attempts at minimizing this boundary effect - symmetric extension, zero-padding, and circular convolution are perhaps the most popular solutions.
The concept of lifting was introduced by Wim Sweldens to alleviate this and other problems associated with a traditional digital filterbank scheme. In this scheme, the general layout of the wavelet analysis grid outlined in Figure 3.1 does not change; however, the basis functions associated with each wavelet coefficient are not necessarily all dilates and translates of a single, mother wavelet shape. Likewise, the frequency characteristics may differ somewhat from those of an analysis done with digital filters employing circular convolution. Nonetheless, the flexibility afforded by the lifting scheme allows the basis functions associated with wavelet coefficients near a window's boundaries to change their general shape at the boundaries. In this manner, a basis function more accommodating to a boundary can be used to minimize boundary effects. Sweldens has named the wavelets generated by the lifting scheme as second generation wavelets (Sweldens "Second generation wavelets"), and has proposed applications of such wavelets on irregularly sampled data, texture processing, and boundary-value problems.
Because windowing is a central issue in the
processing of large data sets such as audio files, the task of
the present chapter is to compare wavelet-based denoising of audio
with two sets of forward and inverse wavelet transforms: 1) the
standard, digital filter filterbank scheme, which employs circular
convolution to deal with window boundaries, and 2) the interpolated,
lifting scheme, which uses polynomial interpolation to accomplish
the forward and inverse lifted wavelet transforms. In this manner,
we hope to determine if the interpolating, lifting scheme alleviates
some of the boundary artifacts created with the circular convolution
scheme.
A. The lifting scheme
In the standard, dyadic multiresolution filterbank scheme, four digital filters are needed to compute the forward and inverse wavelet transforms. While the digital filters do not change during the computation of either transform, the computation of the filter coefficients themselves is not a trivial task. The mathematical constraints discussed in Chapter 2 must be observed, and a solution to the dilation equation must be found which also incorporates other restrictions, such as general shape, number of vanishing moments, and filter length.
The lifting scheme takes a different approach to the construction of these filters, and does not necessarily require the direct solution of the dilation equation. In principle, the lifting scheme builds complex filters from simple, even trivial filters that already satisfy the defining properties of a multiresolution analysis filterbank.
Specifically, the primary lifting scheme
uses a set of existing primary wavelet functions and dual scaling
functions to create new, more complex primary wavelet functions
and dual scaling functions. Consequently, the analyzing low pass
filter and the synthesizing high pass filter are altered as well.
The analyzing high pass filter and the synthesizing low pass filter,
along with the primary scaling functions and the dual wavelet
functions, are left unaltered:

Please note the three indices involved with the filters above. Implicit in these three indices is the notion that the s filter may change for different scales and translates of a wavelet analysis. The first index refers to the level of a wavelet analysis, the second index refers to the specific translate of the level of analysis specified in the first index, and the third index references the individual filter coefficients for that particular scale and translate of the analysis.
By similar arguments, a dual lifting scheme can be constructed that uses a set of existing dual wavelet functions and primary scaling functions to create new, more complex dual wavelet functions and primary scaling functions. Consequently, in the dual lifting scheme, the synthesizing low pass filter and the analyzing high pass filter are altered as well. The synthesizing high pass filter and the analyzing low pass filter, along with the dual scaling functions and the primary wavelet functions, are left unaltered.
The strength of the lifting and dual lifting
schemes is the introduction of the s filter for a primary
lifting, and the
filter for a dual lifting.
The coefficients in both the s and the
filters can be chosen freely; therefore, they can be used to improve
the characteristics of an existing analysis/synthesis filterbank
by introducing more vanishing moments, altering the shape of the
mother wavelet, etc. Since primary lifting and dual lifting affect
complementary pairs of filters, execution of the lifting scheme
can be followed by a dual lifting, or vice-versa, resulting in
a filterbank in which all four filters have been altered in an
advantageous way.
In practice, the forward and inverse lifted
wavelet transform can be implemented using techniques similar
to the original filterbank scheme introduced in Figure 2.1. The
forward transform algorithm computes the scaling and wavelet coefficients
from the old filters first, and then applies the new filters to
the already computed coefficients to update, or "lift"
the old coefficients to their new values:

The inverse transform works in a similar manner.
First, the "lifting" done by (3.11) is undone, and
then the wavelet coefficients and the "un-lifted"
scaling coefficients are processed by the old filters. The sum
is then sent as the input to the next stage of the inverse transform:

B. Lifted, interpolating wavelets
The power of the lifting mechanism comes from
the flexibility of the s and
filters. Both
filters can alter the analyzing and synthesizing wavelet and scaling
functions to custom-design a set of filters to a certain task.
Given a trivial set of starting filters, then, the question becomes
a matter of computing the desired s and/or
filter coefficients.
One interesting application of the s and
filters is their use in the theory and computation of interpolating
wavelets. In this mechanism, the standard filterbank scheme in
Figure 2.1 is not used directly; instead, only downsampling and
polynomial interpolation of samples are used. An advantage of
this scheme is that polynomial interpolation using higher order
polynomial approximations can lead to wavelet analyses with arbitrarily
high (2n) vanishing moments.
First, one must chose the order of polynomial to use in the interpolation scheme. Let this value be d. In general, for dth order polynomial interpolation, 2d data points are needed. For example, for linear interpolation (d=1), 2 data points are needed; for quadratic interpolation (d=2), 4 data points are needed, etc. There will be 2d vanishing moments associated with the analysis.
Computation of the forward transform begins by using 2d sample points around a central axis to compute the value of a best-fit, interpolating polynomial of order d at the central axis. The 2d values used are each 2 samples away from each other in this dyadic analysis, which closely mimics the downsampling operators in the filterbank scheme of Figure 2.1. The interpolated value is then compared to the actual, existing sample value at the central axis. The difference is taken between the interpolated and actual values, and is called the wavelet coefficient at that translation (central axis) value. The scaling coefficient at that translation (central axis) is computed in a similar manner, by interpolating between existing scaling coefficients around a central axis. This procedure is repeated for the specified number of iterations. The inverse transform can be accomplished in a similar manner, by adding the difference of interpolated and actual values (wavelet coefficients) to a re-computed, interpolated value. Formal arguments have shown that such an analysis does indeed produce a multiresolution analysis satisfying all of the mathematical constraints outlined in Chapter 2, including scale and shift invariance (Sweldens and Schröder).
In fact, such an analysis has links to the
lifting scheme decried above in II.A. Polynomial interpolation
used in the above manner is the dual lifting of the lazy wavelet
filterbank. The lazy wavelet filterbank is simply a two-channel
filterbank which separates the input into even and odd samples.
This conclusion is important: we can explain the above interpolation
scheme using the language of lifting, developed above in II.A.
The only question that remains is to find how the s and
filters are computed to achieve such an interpolation-based result.
A common algorithm that is used to do polynomial
interpolation and extrapolation is Neville's algorithm.
The use of Neville's algorithm in the lifted, interpolated
forward and inverse transform for both the forward and inverse
lifting arises from the linear proportionality of the primary
s filter coefficients to the
filter
coefficients (Sweldens, "A Construction of second generation
wavelets," 18). In essence, Neville's algorithm not
only chooses the s and
filters for us,
it also performs the filtering process in (3.9) - (3.13) without
the user needing to know the s and
filter
coefficients exactly.
At the boundary, we can still use Neville's
algorithm to evaluate the wavelet and scaling coefficients, even
though interpolation is done in a slightly different way. Instead
of choosing a central axis about which to choose 2d data points
as interpolation input parameters, we select an off-center axis
about which to choose these points, based on where the boundary
occurs. For example, if we want to use linear interpolation somewhere
in the middle of a data window, we simply choose one sample on
either side of the desired point for which we desired an interpolated
value; the boundary has no effect on this computation. However,
if we wish to use linear interpolation to guess a value at the
left boundary of a window, we use two points on the right of the
boundary. Linear extrapolation is now used, although Neville's
algorithm can be used to compute both interpolated and extrapolated
values. In this manner, we can still use polynomial interpolation
at the boundaries, and still achieve 2d vanishing moments at the
boundaries. The advantage of using an interpolating polynomial
of degree d at the boundaries is that 2d vanishing moments are
still present in the analysis right up to the sample at the boundary.
Because of this uniformity of analysis throughout the window,
we hope that boundary artifacts are lessened by this use of interpolation
and lifting.
III. Results
A. Denoising using binary wavelets
An audio file of a male speaker was corrupted (mixed) with white noise to form the test signal. As a control, a denoising program was written that made use of the 5/3 binary coefficient filterbank described in Table 2.1 and Table 2.2. The soundfile is broken down into windows of samples, and the wavelet transform is applied directly to each window of data, using circular convolution techniques to deal with the boundaries of each window. Several denoising attempts were made with varying numbers of iterations and thresholds.
The results produce a somewhat harsh sound
for many of the trials, although a considerable amount of noise
is removed from the test signal in many cases. An increased number
of iterations seems to lower the remaining noise's ceiling
frequency. This is reasonable, since an increased number of iterations
provides more information on the lower frequencies of a signal,
and allows denoising to occur at these lower frequencies with
a greater frequency resolution. Increasing the threshold to about
200 seemed to provide the best result, while a threshold of 5000
was too high for the denoising scheme, and produced a gating effect
through which only fricative speech sounds are heard.
B. Denoising using lifted, interpolating wavelets
The same test file was used as the source for tests with lifted, interpolating wavelet-based denoising. A program was written which employs the lifted, interpolating transform outlined in II.A and II.B above. Several denoising attempts were made with varying numbers of iterations, thresholds, and vanishing moments.
Varying the number of iterations and threshold parameters had a similar effect with the lifting scheme as varying the same parameters in the circular convolution scheme. Increasing the number of vanishing moments in this scheme had little effect; in fact, higher vanishing moments in audio examples 3.15 and 3.16 produced more audible boundary artifacts.
Note that the number of vanishing moments
in a denoising experiment is equal to the order of the interpolating
polynomial used in that particular experiment.
IV. Conclusions
The circular convolution denoising scheme is the easiest to implement and understand. It uses techniques which are used in many digital filters, although some audible clicks can be heard in the denoised output at the window boundaries. Conversely,
the lifting scheme is a more powerful and flexible technique; however, the audible output for similar input parameters is not as good in this implementation. In fact, the boundary artifacts are made more severe when a high number of vanishing moments are used.
The lifting scheme with interpolated wavelets is a powerful technique for wavelet analysis: it allows the custom design of biorthogonal, symmetric wavelets with an arbitrary (2n) number of vanishing moments that preserve these vanishing moment properties even at the boundaries of an data window. However, despite these advantages, when the interpolation at the boundaries does not produce a wavelet coefficient with zero value, the resulting boundary wavelet coefficient tends to have a very large magnitude due to the extreme sensitivity of polynomial extrapolation at the boundary of the data set.
The wavelet coefficients produced at the boundary, in this manner, tend to have little value, and therefore not only do not solve the problem of wavelet analysis at the boundary, but even tend to exaggerate the problem. In other words, when the polynomial approximation in the lifting scheme works exactly, it works well; however, when the approximation is off as is, by far, most often the case, it tends to be off by large amounts (a few orders of magnitude is an empirical best guess). Artifacts can be heard as more noticeable audible clicks at the boundary.
Fortunately, the broad-band denoising scheme mostly alters wavelet coefficients with small values, so that these large values are ignored in the denoising process. The perfect reconstruction property of the lifted wavelet transform just transforms these large-magnitude wavelet coefficients back into the original audio, so the listener is unaware of boundary effects due to extremely large wavelet coefficients. However, these large, intermediate wavelet coefficients at the boundaries can sometimes be heard as clicks when the user requests both a primary/dual wavelet with a high number of vanishing moments and a fairly high number of iterations.
Since the standard wavelet decomposition scheme
was used in this noise reduction (recursive splitting of the low-pass
output), it is no wonder that there are also low-frequency noise
artifacts left in the denoised output. Soft thresholding only
eliminates wavelet coefficients below the threshold value and
adjusts others to minimize discontinuities in the final reconstruction.
However, since the wavelet decomposition must stop after only
a few iterations to prevent the surfacing of boundary artifacts,
not all low-frequency noise can be removed. A best basis algorithm,
as proposed and implemented by Coifman and Wickerhauser, or another
non-standard division of the frequency space would do better in
eliminating these low-frequency artifacts. Dynamic computation
of the threshold value is also another possible improvement to
this process.
Go to: Title Page Chapter 1 Appendix A Appendix J Copyright Chapter 2 Appendix B Appendix K Abstract Chapter 3 Appendix C Appendix L Acknowledgements Chapter 4 Appendix D References Table of Contents Chapter 5 Appendix E List of Figure Conclusions/ Appendix F List of Tables Future Directions Appendix G List of Audio Examples Appendix H List of Programs Appendix I