0% found this document useful (0 votes)
71 views

A (L, K) Are Known As The Approximation

This document discusses speech compression using wavelet transforms. It begins by introducing speech compression and how the wavelet transform decomposes speech signals into coefficients at different scales and positions. It then covers discrete wavelet transforms and how they involve choosing scales and positions based on powers of two. The document also discusses how the fast wavelet transform algorithm implements filtering to decompose signals into approximation and detail coefficients, and how multi-level decomposition breaks signals down into lower resolution components in a decomposition tree. In the end, it notes how the original signal can be reconstructed using inverse discrete wavelet transforms.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

A (L, K) Are Known As The Approximation

This document discusses speech compression using wavelet transforms. It begins by introducing speech compression and how the wavelet transform decomposes speech signals into coefficients at different scales and positions. It then covers discrete wavelet transforms and how they involve choosing scales and positions based on powers of two. The document also discusses how the fast wavelet transform algorithm implements filtering to decompose signals into approximation and detail coefficients, and how multi-level decomposition breaks signals down into lower resolution components in a decomposition tree. In the end, it notes how the original signal can be reconstructed using inverse discrete wavelet transforms.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

numerous.

This paper looks at a new


technique for analyzing and compressing
speech signals using wavelets. Any signal
can be represented by a set of scaled and
ABSTRACT translated versions of a basic function
Speech compression is the technology of called the. mother wavelet. This set of
converting human speech into an wavelet functions forms the wavelet
efficiently encoded representation that coefficients at different scales and
can later be decoded to produce a close positions and results from taking the
approximation of the original signal. The wavelet transform of the original signal.
wavelet transform of a signal Speech is a non-stationary random
decomposes the original signal into process due to the time varying nature of
wavelets coefficients at different scales the human speech production system.
and positions. These coefficients Non-stationary signals are characterized
represent the signal in the wavelet by numerous transitory drifts, trends and
domain and all data operations can be abrupt changes. The localization feature
performed using just the corresponding of wavelets, along with its time-
wavelet coefficients. The major issues frequency resolution properties makes
concerning the design of this Wavelet them well suited for coding speech
based speech coder are choosing optimal signals.
wavelets for speech signals,
decomposition level in the DWT, 2. WAVELETS Vs FOURIER
thresholding criteria for coefficient ANALYSIS
truncation and efficient encoding of A major draw back of Fourier analysis is
truncated coefficients. The performance that in transforming to the frequency
of the wavelet compression scheme on domain, the time domain information is
both male and female spoken sentences lost. The most important difference
is compared. On a male spoken sentence between these two kinds of transforms is
the scheme reaches a signal-to-noise that individual wavelet functions are
ratio of 17.45 db and a compression ratio localized in space. In contrast Fourier
of 3.88, using a level dependent sine and cosine functions are non-local
thresholding approach. and are active for all time t.

1. INTRODUCTION 3. DISCRETE WAVELET


Speech is a very basic way for humans to TRANSFORM
convey information to one another. With The Discrete Wavelet Transform (DWT)
a bandwidth of only 4 kHz, speech can involves choosing scales and positions
convey information with the emotion of a based on powers of two so called dyadic
human voice. People want to be able to scales and positions. The mother wavelet
hear someone’s voice from anywhere in is rescaled or. dilated, by powers of two
the world as if the person was in the same and translated by integers. The numbers
room .As a result a greater emphasis is a(L, k) are known as the approximation
being placed on the design of new and coefficients at scale L, while d(j,k) are
efficient speech coders for voice known as the detail coefficients at scale j.
communication and transmission. Today The approximation and detail coefficients
applications of speech coding and can be expressed as:
compression have become very
To explain the implementation of the Fast
Wavelet Transform algorithm consider
the following equations:

3.1. VANISHING MOMENTS


The number of vanishing moments of a
wavelet indicates the smoothness of the
wavelet function as well as the flatness of
the frequency response of the wavelet The first equation is known as the twin-
filters (filters used to compute the DWT) scale relation (or the dilation equation)
Typically a wavelet with p vanishing and defines the scaling function. The next
moments satisfies the following equation, equation expresses the wavelet in terms
of the scaling function. The third
equation is the condition required for the
wavelet to be orthogonal to the scaling
function and it translates the coefficients
c(k) or {c0, .., c2N-1} in the above
equations represent the impulse response
Wavelets with a high number of
coefficients for a low pass filter of length
vanishing moments lead to a more
2N, with a sum of 1 and a norm of
compact signal representation and are
1/squrerootof(2). The high pass filter is
hence useful in coding applications.
obtained from the low pass filter using
However, in general, the length of the
the relationship
filters increases with the number of
vanishing moments and the complexity of
computing the DWT coefficients where k varies over the range (1-(2N-1))
increases with the size of the wavelet to 1.
filters. Starting with a discrete input signal
vector s, the first stage of the FWT
3.2. FAST WAVELET TRANSFORM algorithm decomposes the signal into two
The Discrete Wavelet Transform (DWT) sets of coefficients. These are the
coefficients can be computed by using approximation coefficients cA1 (low
Mallet’s Fast Wavelet Transform frequency information) and the detail
algorithm. This algorithm is sometimes coefficients cD1 (high frequency
referred to as the two-channel sub-band information), as shown in the figure
coder and involves filtering the input below.
signal based on the wavelet function
used.

3.2. IMPLEMENTATION USING


FILTERS
signal is broken down into many lower
resolution components. This is called the
wavelet decomposition tree. The wavelet
Decomposition of the signal s analyzed at
level j has the following structure [cAj,
cDj, cD1].Looking at a signals wavelet
decomposition tree can reveal valuable
information. The diagram below shows
the wavelet decomposition to level 3 of a
sample signal S.
The coefficient vectors are obtained by
convolving s with the low-pass filter
Lo_D for approximation and with the
high-pass filter Hi_D for details. This
filtering operation is then followed by
dyadic decimation or down sampling by a
factor of 2. Mathematically the two-
channel filtering of the discrete signal s is
represented by the expressions:

Since the analysis process is iterative, in


theory it can be continued indefinitely. In
reality, the decomposition can only
These equations implement a convolution proceed until the vector consists of a
plus down sampling by a factor 2 and single sample. Normally, however there
give the forward fast wavelet transform. is little or no advantage gained in
If the length of each filter is equal to 2N decomposing a signal beyond a certain
and the length of the original signal s is level. The selection of the optimal
equal to n, then the corresponding lengths decomposition level in the hierarchy
of the coefficients cA1 and cD1 are given depends on the nature of the signal being
by the formula: analyzed or some other suitable criterion,
such as the low-pass filter cut-off.

3.4. SIGNAL RECONSTRUCTION


The original signal can be reconstructed
or synthesized using the inverse discrete
This shows that the total length of the wavelet transform (IDWT). The synthesis
wavelet coefficients is always slightly starts with the approximation and detail
greater than the length of the original coefficients cAj and cDj, and then
signal due to the filtering process used. reconstructs cAj-1 by up sampling and
filtering with the reconstruction filters.
3.3. MULTILEVEL
DECOMPOSITION
The decomposition process can be
iterated, with successive approximations
being decomposed in turn, so that one
Choosing a wavelet that has compact
support in both time and frequency in
addition to a significant number of
vanishing moments is essential for an
optimum wavelet speech compressor.
This is followed very closely by the
Daubechies D20, D12, D10 or D8
wavelets, all concentrating more than
96% of the signal energy in the Level 1
approximation coefficients. Wavelets
The reconstruction filters are designed in with more vanishing moments provide
such a way to cancel out the effects of better reconstruction quality, as they
aliasing introduced in the wavelet introduce less distortion into the
decomposition phase. The reconstruction processed speech and concentrate more
filters (Lo_R and Hi_R) together with the signal energy in a few neighboring
low and high pass decomposition filters, coefficients.
forms a system known as quadrature
mirror filters (QMF). For a multilevel 4.2. WAVELET DECOMPOSITION
analysis, the reconstruction process can Wavelets work by decomposing a signal
itself be iterated producing successive into different resolutions or frequency
approximations at finer resolutions and bands, and this task is carried out by
finally synthesizing the original signal. choosing the wavelet function and
4. WAVELET SPEECH computing the Discrete Wavelet
COMPRESSION Transform (DWT). Signal compression is
The idea behind signal compression using based on the concept that selecting a
wavelets is primarily linked to the small number of approximation
relative scarceness of the wavelet domain coefficients (at a suitably chosen level)
representation for the signal. Wavelets and some of the detail coefficients can
concentrate speech information (energy accurately represent regular signal
and perception) into a few neighboring components. Choosing a decomposition
coefficients. Therefore as a result of level for the DWT usually depends on the
taking the wavelet transform of a signal, type of signal being analyzed or some
many coefficients will either be zero or other suitable criterion such as entropy.
have negligible magnitudes. Data For the processing of speech signals
compression is then achieved by treating decomposition up to scale 5 is adequate,
small valued coefficients as insignificant with no further advantage gained in
data and thus discarding them. The processing beyond scale 5.
process of compressing a speech signal
using wavelets involves a number of 4.3. TRUNCATION OF
different stages, each of which are COEFFICIENTS
discussed below. After calculating the wavelet transform of
the speech signal, compression involves
4.1. CHOICE OF WAVELET truncating wavelet coefficients below a
The choice of the mother-wavelet threshold This means that most of the
function used in designing high quality speech energy is in the high-valued
speech coders is of prime importance. coefficients, which are few. Thus the
small valued coefficients can be truncated In speech there are two major types of
or zeroed and then be used to reconstruct excitation, voiced and unvoiced. Voiced
the signal. This compression scheme sounds are produced when air flows
provided a segmental signal-to-noise ratio between the vocal cords and causes them
(SEGSNR) of 20 dB, with only 10% of to vibrate. Voiced speech tends to be
the coefficients. Two different periodic in nature. Examples of voiced
approaches are available for calculating sounds are English vowels, such as the /a/
thresholds. The first, known as Global in .bay and the /e/ in .see. Unvoiced
Thresholding involves taking the wavelet sounds result from constricting the vocal
expansion of the signal and keeping the tract at some point so that turbulence is
largest absolute value coefficients. In this produced by air flowing past the
case you can manually set a global constriction. Since unvoiced speech is
threshold, a compression performance or due to turbulence, the speech is aperiodic
a relative square norm recovery and has a noise-like structure. Some
performance. Thus only a single examples of unvoiced English sounds are
parameter needs to be selected. The the /s/ in .so and the /h/ in .he.. In general
second approach known as By Level at least 90% of the speech energy is
Thresholding consists of applying always retained in the first N/2 transform
visually determined level dependent coefficients, if the speech is a voiced
thresholds to each decomposition level in frame However, for an unvoiced frame
the wavelet transform. the energy is spread across several
frequency bands and typically the first
N/2 coefficients holds less than 40% of
the total energy. Due to this wavelets are
inefficient at coding unvoiced speech.
4.4. ENCODING COEFFICIENTS Unvoiced speech frames are infrequent.
Signal compression is achieved by first By detecting unvoiced speech frames and
truncating small-valued coefficients and directly encoding them (perhaps using
then efficiently encoding them. One way entropy coding), no unvoiced data is lost
of approach to compression is to encode and the quality of the compressed speech
consecutive zero valued coefficients, with will remain transparent.
two bytes. One byte to indicate a
sequence of zeros in the wavelet 4.6. PERFORMANCE MEASURES
transforms vector and the second byte A number of quantitative parameters can
representing the number of consecutive be used to evaluate the performance of
zeros. For further data compaction a the wavelet based speech coder, in terms
suitable bit encoding format, can be used of both reconstructed signal quality after
to quantize and transmit the data at low decoding and compression scores. The
bit rates. A low bit rate representation can following parameters are compared:
be achieved by using an entropy coder
like Huffman coding or arithmetic 4.6.1. SIGNAL TO NOISE RATIO
coding.

4.5. DETECTING VOICED Vs


UNVOICED SPEECH FRAMES
performance is given below for the
different wavelets used.

4.6.2. PEAK SIGNAL TO NOISE


RATIO (PSNR)

4.6.3. NORMALISED ROOT MEAN


SQUARE ERROR (NRMSE)

4.6.4. RETAINED SIGNAL ENERGY

CONCLUSION
4.6.5. COMPRESSION RATIOS Speech coding is currently an active topic
for research in the areas of Very Large
Scale Integrated (VLSI) circuit
technologies and Digital Signal
Processing (DSP). The Discrete Wavelet
Transform performs very well in the
compression of recorded speech signals.
5. PERFORMANCE OF RECORDED For real time speech processing however,
SPEECH CODING its performance is not as good. Therefore
A male and female spoken speech signals for real time speech coding it is
were decomposed at scale 3 and level recommended to use a wavelet with a
dependent thresholds were applied using small number of vanishing moments at
the Birge-Massart strategy. Since the level 5 decomposition or less. The
speech files were of short duration, the wavelet based compression designed
entire signal was decomposed at once reaches a signal to noise ratio of 17.45 db
without framing. A summary of the at a compression ratio of 3.88 using the
Daubechies 10 wavelet. The performance
of the wavelet scheme in terms of
compression scores and signal quality is
comparable with other good techniques
such as code excited linear predictive
coding (CELP) for speech, with much
less computational burden. In addition,
using wavelets the compression ratio can
be easily varied, while most other
compression techniques have fixed
compression ratios.

References
[1]. A. Chen, N. Shehad, A. Virani and
E. Welsh, Discrete Wavelet Transform
for
Audio Compression, (current July. 16,
2001).
[2]. Speech Compression Using
Wavelets by Nikhil Rao
[3]. S.Haykin, Communication Systems,
John Wiley & Sons, New York, 2001.

You might also like