Implementation of An Acoustic Echo Canceller Using MATLAB PDF
Implementation of An Acoustic Echo Canceller Using MATLAB PDF
Scholar Commons
Graduate Teses and Dissertations Graduate School
2003
Implementation of an acoustic echo canceller using
MATLAB
Srinivasaprasath Raghavendran
University of South Florida
Follow this and additional works at: htp://scholarcommons.usf.edu/etd
Part of the American Studies Commons
Tis Tesis is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate
Teses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected].
Scholar Commons Citation
Raghavendran, Srinivasaprasath, "Implementation of an acoustic echo canceller using MATLAB" (2003). Graduate Teses and
Dissertations.
htp://scholarcommons.usf.edu/etd/1453
Implementation of an Acoustic Echo Canceller
Using Matlab
by
Srinivasaprasath Raghavendran
A thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science in Electrical Engineering
Department of Electrical Engineering
College of Engineering
University of South Florida
Major Professor: Wilfrido A. Moreno, Ph.D.
James T. Leffew, Ph.D.
Wei Qian, Ph.D.
Date of Approval:
October 15, 2003
Keywords: aec, nlms, dtd, nlp, matlab
Copyright 2003, Srinivasaprasath Raghavendran
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to my Major Professor Dr. Wilfrido
A. Moreno, for being a constant source of help and inspiration throughout my work. His
timely advice and guidelines have assisted me to get through a lot of difficult situations.
My other committee members, Dr. James T. Leffew and Dr. Wei Qian have been very
considerate and cooperative with me. I would like to thank them for their prompt
feedback and being approachable and available whenever I needed any assistance. I
would also like to thank the IEC forum for the help and valuable suggestions.
At this juncture, I thank my parents, my sister and my friends for their total
support and encouragement. This Masters thesis would not have been possible without
their support.
i
TABLE OF CONTENTS
LIST OF TABLES iii
LIST OF FIGURES iv
ABSTRACT vi
CHAPTER 1 INTRODUCTION 1
1.1 Need for Echo Cancellation 1
1.2 Basics of Echo 2
1.3 Types of Echo 3
1.4 The Process of Echo Cancellation 3
1.4.1 Adaptive Filter 4
1.4.2 Doubletalk Detector 5
1.4.3 Nonlinear Processor 5
1.5 Echo Cancellation Challenges 5
1.5.1 Avoiding Divergence 6
1.5.2 Handling Doubletalk 6
1.5.3 Preventing Clipping 7
1.6 Research Motivation and Thesis Outline 7
CHAPTER 2 ECHOES IN TELECOMMUNICATION NETWORKS 9
2.1 Hybrid / Electrical Echo 9
2.2 Acoustic Echo 11
2.3 Long Distance Calls between Fixed Telephones 12
2.3.1 Echo Suppressors 14
2.4 Full-duplex Data Transmission between Voice-band Modems 15
2.5 Short Distance Connections between Fixed and Cellular Lines 17
2.6 Teleconference/Videoconference Communication Systems 18
CHAPTER 3 THE ECHO CANCELLATION ALGORITHM 21
3.1 Basic Echo Canceller 21
3.2 Components of Acoustic Echo Canceller (AEC) 24
3.3 Adaptive Filtering 25
3.3.1 Least Mean Square (LMS) Algorithm 26
3.3.1.1 Generic LMS Algorithm 27
3.3.2 Normalized Least Mean Square (NLMS) Algorithm 29
ii
3.4 Doubletalk Detector (DTD) 31
3.4.1 The Generic Doubletalk Detection Schemes 33
3.4.2 The Geigel Algorithm 34
3.4.3 Cross Correlation Method 35
3.4.4 Normalized Cross Correlation Method 36
3.5 Nonlinear Processor (NLP) 37
3.5.1 Noise Gate as NLP 38
3.5.2 A Generic Expander 38
3.5.3 Noise Gate 40
CHAPTER 4 SIMULATION AND RESULTS 43
4.1 Why MATLAB? 43
4.2 Simulation Flowchart 44
4.3 Description of the Simulation Setup 46
4.4 Results 46
4.5 Evaluation of the Echo Cancellation Algorithm 52
4.5.1 Convergence Test 53
4.5.2 Echo Return Loss Enhancement (ERLE) 53
4.5.3 Auditory Test 54
CHAPTER 5 CONCLUSION AND FURTHER WORK 55
5.1 Conclusion 55
5.2 Further Work 56
REFERENCES 57
iii
LIST OF TABLES
Table 3.1: LMS Algorithm 29
Table 3.2: NLMS Algorithm 31
iv
LIST OF FIGURES
Figure 1.1: Block Diagram of a Generic Echo Canceller 4
Figure 2.1: Hybrid Echo 10
Figure 2.2: Sources of Acoustic Echo in a Room 12
Figure 2.3: Simplified Long Distance Connections 13
Figure 2.4: Echo Suppressor at Near-end Talker B Path 15
Figure 2.5: Echo Canceller at Modem Locations for Full-Duplex Voice-band Modems 16
Figure 2.6: Cellular to Fixed Telephone Connection 18
Figure 2.7: Adaptive Acoustic Echo Cancellation in an Enclosed Environment 19
Figure 3.1: A Basic Echo Canceller 21
Figure 3.2: A Generic Adaptive Echo Canceller 22
Figure 3.3: Echo Canceller with Doubletalk Detector and Nonlinear Processor 24
Figure 3.4: LMS Algorithm 26
Figure 3.5: Generic LMS Algorithm 27
Figure 3.6: Basic Block Diagram of an Expander 38
Figure 3.7: Input / Output Characteristics of an Expander 40
Figure 3.8: The Effect of an Expander on a Signal 42
Figure 4.1: Flowchart of the MATLAB Simulation 45
Figure 4.2: Plot of the Far-end Signal, x(n) 47
v
Figure 4.3: Plot of the Echo Signal, r(n) 48
Figure 4.4: Plot of the Near-end Signal, v(n) 49
Figure 4.5: Plot of the Desired Signal, d(n) 49
Figure 4.6: Plot of the Error Signal e(n) 51
Figure 4.7: Plot of the Error Signal after Nonlinear Processing 52
Figure 4.8: Plot of ERLE Vs the Number of Samples 54
vi
IMPLEMENTATION OF AN ACOUSTIC ECHO CANCELLER
USING MATLAB
Srinivasaprasath Raghavendran
ABSTRACT
The rapid growth of technology in recent decades has changed the whole dimension of
communications. Today people are more interested in hands-free communication. In
such a situation, the use a regular loudspeaker and a high-gain microphone, in place of a
telephone receiver, might seem more appropriate. This would allow more than one
person to participate in a conversation at the same time such as a teleconference
environment. Another advantage is that it would allow the person to have both hands
free and to move freely in the room. However, the presence of a large acoustic coupling
between the loudspeaker and microphone would produce a loud echo that would make
conversation difficult. Furthermore, the acoustic system could become instable, which
would produce a loud howling noise to occur.
The solution to these problems is the elimination of the echo with an echo
suppression or echo cancellation algorithm. The echo suppressor offers a simple but
effective method to counter the echo problem. However, the echo suppressor possesses a
main disadvantage since it supports only half-duplex communication. Half-duplex
communication permits only one speaker to talk at a time. This drawback led to the
vii
invention of echo cancellers. An important aspect of echo cancellers is that full-duplex
communication can be maintained, which allows both speakers to talk at the same time.
This objective of this research was to produce an improved echo cancellation
algorithm, which is capable of providing convincing results. The three basic components
of an echo canceller are an adaptive filter, a doubletalk detector and a nonlinear
processor. The adaptive filter creates a replica of the echo and subtracts it from the
combination of the actual echo and the near-end signal. The doubletalk detector senses
the doubletalk. Doubletalk occurs when both ends are talking, which stops the adaptive
filter in order to avoid divergence. Finally, the nonlinear processor removes the residual
echo from the error signal. Usually, a certain amount of speech is clipped in the final
stage of nonlinear processing. In order to avoid clipping, a noise gate was used as a
nonlinear processor in this research. The noise gate allowed a threshold value to be set
and all signals below the threshold were removed. This action ensured that only residual
echoes were removed in the final stage. To date, the real time implementation of echo an
cancellation algorithm was performed by utilizing both a VLSI processor and a DSP
processor. Since there has been a revolution in the field of personal computers, in recent
years, this research attempted to implement the acoustic echo canceller algorithm on a
natively running PC with the help of the MATLAB software.
1
CHAPTER 1
INTRODUCTION
1.1 Need for Echo Cancellation
In this new age of global communications, wireless phones are regarded as
essential communications tools and have a direct impact on peoples day-to-day personal
and business communications. As new network infrastructures are implemented and
competition between wireless carriers increases, digital wireless subscribers are
becoming ever more critical of the service and voice quality they receive from network
providers. Subscriber demand for enhanced voice quality over wireless networks has
driven a new and key technology termed echo cancellation, which can provide near wire
line voice quality across a wireless network.
Todays subscribers use speech quality as a standard for assessing the overall
quality of a network. Regardless of whether or not the subscribers opinion is subjective,
it is the key to maintaining subscriber loyalty. For this reason, the effective removal of
hybrid and acoustic echoes, which are inherent within the telecommunications network
infrastructure, is the key to maintaining and improving the perceived voice quality of a
call. Ultimately, the search for improved voice quality has led to intensive research into
the area of echo cancellation. Such research is conducted with the aim of providing
solutions that can reduce background noise and remove hybrid and acoustic echoes
2
before any transcoder processing occurs. By employing echo cancellation technology,
the quality of speech can be improved significantly. This chapter discusses the overall
echo problem. A definition of echo precedes the discussion of the fundamentals of echo
cancellation and the voice quality challenges encountered in todays networks.
1.2 Basics of Echo
Echo is a phenomenon where a delayed and distorted version of an original sound
or electrical signal is reflected back to the source. With rare exceptions, conversations
take place in the presence of echoes. Echoes of our speech are heard as they are reflected
from the floor, walls and other neighboring objects. If a reflected wave arrives after a
very short time of direct sound, it is considered as a spectral distortion or reverberation.
However, when the leading edge of the reflected wave arrives a few tens of milliseconds
after the direct sound, it is heard as a distinct echo [1].
Since the advent of telephony echoes have been a problem in communication
networks. In particular, echoes can be generated electrically due to impedance
mismatches at various points along the transmission medium. The most important factor
in echoes is called end-to-end delay, which is also known as latency. Latency is the time
between the generation of the sound at one end of the call and its reception at the other
end. Round trip delay, which is the time taken to reflect an echo, is approximately twice
the end-to-end delay.
Echoes become annoying when the round trip delay exceeds 30 ms. Such an echo
is typically heard as a hollow sound. Echoes must be loud enough to be heard. Those
less than thirty (30) decibels (dB) are unlikely to be noticed. However, when round trip
3
delay exceeds 30 ms and echo strength exceeds 30 dB, echoes become steadily more
disruptive. However, not all echoes reduce voice quality. In order for telephone
conversations to sound natural, callers must be able to hear themselves speaking. For this
reason, a short instantaneous echo, termed side tone, is deliberately inserted. The side
tone is coupled with the callers speech from the telephone mouthpiece to the earpiece so
that the line sounds connected.
1.3 Types of Echo
In telecommunications networks there are two types of echo. One source for an
echo is electrical and the other echo source is acoustic [1]. The electrical echo is due to
the impedance mismatch at the hybrids of a Public Switched Telephony Network,
(PSTN), exchange where the subscriber two-wire lines are connected to four-wire lines.
If a communication is simply between two fixed telephones, then only the electrical echo
occurs. However, the development of hands-free teleconferencing systems gave rise to
another kind of echo known as an acoustic echo. The acoustic echo is due to the coupling
between the loudspeaker and microphone. These electrical and acoustic echoes are
discussed in greater detail in chapter 2.
1.4 The Process of Echo Cancellation
An echo canceller is basically a device that detects and removes the echo of the
signal from the far end after it has echoed on the local ends equipment. In the case of
circuit switched long distance networks, echo cancellers reside in the metropolitan
4
Central Offices that connect to the long distance network. These echo cancellers remove
electrical echoes made noticeable by delay in the long distance network.
An echo canceller consists of three main functional components:
Adaptive filter
Doubletalk detector
Non-linear processor
A brief overview of these components is presented in this chapter. However, a
detailed sketch that involves mathematical illustrations is provided in chapter 3.
Input signal x(n)
Reference signal
y(n)
Clear signal e(n)
Non-Linear
Processor
Adaptive Filter
Doubletalk
detector
Doubletalk
decision
Filtered signal
Figure 1.1: Block Diagram of a Generic Echo Canceller
1.4.1 Adaptive Filter
The adaptive filter is made up of an echo estimator and a subtractor. The echo
estimator monitors the received path and dynamically builds a mathematical model of the
line that creates the returning echo. The model of the line is convolved with the voice
stream on the receive path. This yields an estimate of the echo, which is applied to the
5
subtractor. The subtractor eliminates the linear part of the echo from the line in the send
path. The echo canceller is said to converge on the echo as an estimate of the line is built
through the adaptive filter.
1.4.2 Doubletalk Detector
A doubletalk detector is used with an echo canceller to sense when far-end speech
is corrupted by near-end speech. The role of this important function is to freeze
adaptation of the model filter when near-end speech is present. This action prevents
divergence of the adaptive algorithm.
1.4.3 Nonlinear Processor
The non-linear processor evaluates the residual echo, which is nothing but the
amount of echo left over after the signal has passed through the adaptive filter. The
nonlinear processor removes all signals below a certain threshold and replaces them with
simulated background noise which sounds like the original background noise without the
echo.
1.5 Echo Cancellation Challenges
An echo canceller has to deal with a number of challenges in order to perform
robust echo cancellation.
6
1.5.1 Avoiding Divergence
The process of divergence is an adaptive filter problem that arises when a suitable
solution for the line model is not found through the use of a mathematical algorithm.
Under specific conditions, certain algorithms are bound to diverge and corrupt the signal
or even add echo to the line. Good echo cancellers are tuned to avoid divergence
situations in nearly all conditions.
1.5.2 Handling Doubletalk
In an active conversation, both talkers often speak at the same time or interrupt
each other. Those situations are called doubletalk. Doubletalk presents a special
processing challenge to echo cancellers. Taken step-by-step, doubletalk proceeds as
follows:
1. A speaks. The echo canceller must compare the received speech from
Speaker A to what would be transmitted back to A in order to approximate
an echo point.
2. B speaks over the echo signal. B speaking constitutes doubletalk. The
echo canceller must detect the doubletalk and cancel the echo without
affecting what is heard locally, which is speaker Bs words.
3. The echo canceller must send Bs speech, as well as the echo-cancelled
version of As own speech, back to A.
Handling doubletalk so that it sounds natural is technically challenging. A good
echo canceller must be able to do the following:
7
It must detect doubletalk and distinguish it from background noise.
The echo canceller must be capable of choosing not to update the line
model in order to avoid divergence if divergence could result.
It needs to make a smooth transition between doubletalk detection,
processing of doubletalk and return to the normal mode.
In summary, an important requirement for echo cancellation is the handling of
doubletalk in a natural manner that does not cause divergence.
1.5.3 Preventing Clipping
Clipping occurs during a telephone conversation when part of the speech is
erroneously removed. Clipping results due to the lack of a precise Non-Linear Processor,
(NLP). Specifically, the NLP fails to start and stop at the right time. Typically, an NLP
does not respond rapidly enough to the introduction of speech through the local end. It
replaces parts of words with background noise, which makes the conversation hard to
follow. The same can happen when the NLP confuses the fading of the voice level at the
end of a sentence with a residual echo.
1.6 Research Motivation and Thesis Outline
Since echo cancellation is a very demanding process, real-time implementation
has only been possible through the use of custom very large scale integration, (VLSI),
processors or digital signal processors (DSP). These processors are specially designed
for signal processing tasks. They provide parallel processing of commands and
optimized pipeline structures. However, since the computation power of regular home
8
personal computers, (PCs), has increased tremendously and powerful software has
evolved, it is now possible to perform real-time signal processing in the PC environment
as well. The advent of this growing capability was the motivation for this research. The
objective of the research was the implementation of a software echo canceller running
natively on a PC with the help of the MATLAB software.
This thesis provides an overview of an improved echo cancellation technique
using a noise gate for the NLP. Chapter 1 discusses the definition of echo, the necessity
of echo cancellers in telecommunications network, the basics of echo cancellation and the
challenges of echo cancellation. Chapter 2 gives an overview of the types of echo and
their sources. It also discusses, in great detail, the echo phenomena in four major
telecommunication systems. The proposed echo cancellation algorithm is explained step-
by-step in chapter 3. Chapter 4 discusses the simulation of the proposed algorithm,
details of the simulation environment and the results obtained. Finally Chapter 5
provides a summary and some ideas concerning further work in this field.
9
CHAPTER 2
ECHOES IN TELECOMMUNICATION NETWORKS
This chapter deals with echoes that are generated in telecommunication systems.
As discussed in chapter one, there are two main types of echo, which are termed
electrical, or hybrid, and acoustic.
2.1 Hybrid/Electrical Echo
Hybrid echoes have been inherent within the telecommunications networks since
the advent of the telephone. This echo is the result of impedance mismatches in the
analog local loop. For example, this happens when mixed gauges of wires are used, or
where there are unused taps and loading coils. In the Public Switched Telephone
Network, (PSTN), by far the main source of electrical echo is the hybrid. This hybrid is a
transformer located at a juncture that connects the two-wire local loop coming from a
subscribers premise to the four-wire trunk at the local telephone exchange. The four-
wire trunks connect the local exchange to the long distance exchange. This situation is
illustrated in Figure 2.1.
10
Hybrid
Deivce
4W Recv Port
Balance
Network
4W Trans Port
2W Port
Hybrid
Echo
Figure 2.1: Hybrid Echo
The hybrid splits the two-wire local loop into two separate pairs of wires. One
pair is used for the transmission path and the other for the receiver path. The hybrid
passes on most of the signal. However, the impedance mismatch between the two-wire
loop and the four-wire facility causes a small part of the received signal to leak back
onto the transmission path. The speaker hears an echo because the far-end receives the
signal and sends part of it back again. Electrical echo is definitely not a problem on local
calls since the relatively short distances do not produce significant delays. However, the
electrical echo must be controlled on long distance calls.
In the early years, when the public network was entirely circuit switched, the
hybrid echo was the only significant source of echo. Since the locations of hybrids and
most other causes of impedance differences in circuit switched networks were known,
adequate echo control could be planned and provisioned. However, in todays digital
networks the points where two wires split into four wires is typically also the point where
analog to digital conversion takes place. Regardless of whether the hybrid and analog to
11
digital conversion is implemented in the same device or in two devices, the two to four
wire conversions constitute an impedance mismatch and echoes are produced [1].
2.2 Acoustic Echo
The acoustic echo, which is also known as a multipath echo, is produced by
poor voice coupling between the earpiece and microphone in handsets and hands-free
devices. Further voice degradation is caused as voice-compressing and
encoding/decoding devices process the voice paths within the handsets and in wireless
networks. This results in returned echo signals with highly variable properties. When
compounded with inherent digital transmission delays, call quality is greatly diminished
for the wireline caller.
Acoustic coupling is due to the reflection of the loudspeakers sound waves from
walls, door, ceiling, windows and other objects back to the microphone. The result of the
reflections is the creation of a multipath echo and multiple harmonics of echoes, which
are transmitted back to the far-end and are heard by the talker as an echo unless
eliminated. Adaptive cancellation of such acoustic echoes has become very important in
hands-free communication systems such as teleconference or videoconference systems
[1]. The multipath echo phenomenon is illustrated in Figure 2.2.
12
Loudspeaker
Microphone
Direct
coupling
Reflections
Figure 2.2: Sources of Acoustic Echo in a Room
In the following sections, the echo phenomena of four communication systems
will be described. The communication systems are:
Long-distance connections between fixed telephones
Full-duplex data transmission between voice-band modems
Short-distance connections between fixed and cellular telephones
Teleconference/videoconference systems
2.3 Long Distance Calls between Fixed Telephones
A simple long-distance telephone connection is presented in Figure 2.3. This
connection contains two-wire sections at the ends, the subscriber loops and possibly some
portion of the local network. It also contains a four-wire section in the center, which is a
carrier system for medium-range to long-range transmissions.
13
Figure 2.3: Simplified Long Distance Connections
Every conventional telephone in a given geographical area is connected to the
local PSTN exchange by a two-wire line, called the subscriber loop, which carries a
connection for both directions of transmission. Simply connecting the two subscriber
loops at the local exchange sets up a local call. However, amplification of the speech
signal becomes necessary when the distance between the two telephones exceeds 35
miles. Therefore, a four-wire line is required, which segregates the two directions of
transmission. A hybrid is used to convert from the two-wire to four-wire line and vice
versa.
An echo can be decreased if the hybrid has a significant loss between its two four-
wire ports. To achieve this large loss the hybrid has to be perfectly balanced by
impedance located at its four-wire portion. Unfortunately, this is not possible in practice
since it requires knowledge of the two-wire impedance, which varies considerably over
the population of subscriber loops. When the bridge is not perfectly balanced, impedance
mismatch occurs. This causes some of the talkers signal energy to be reflected back as
an echo. Adding an insertion loss to the four-wire portions of the connection can control
the effects of echo. Such action is effective since the echo signals experience this loss
two or three times while the talkers speech suffers this loss only once. However, on
long-range connections the insertion loss can become very significant. Hence, it is not a
14
favorable solution and other echo control techniques such as echo suppression must be
used [1].
2.3.1 Echo Suppressors
Echo suppressors have been used since the introduction of long distance
communication. This device basically takes advantage of the fact that people seldom talk
simultaneously. The situation of two people talking simultaneously is termed double
talking. The echo suppressor is also helped by the fact that during such double talking
poor transmission quality is less noticeable. Figure 2.4 illustrates how the echo
suppressor dynamically controls the connection based on who is talking, which is decided
by the speech and double talking detector. Double talking is detected if the level of the
signal in path L1 is significantly lower than that in path L2. When the far-end talker A is
speaking, the path used to transmit the near-end speech is opened so that the echo is
prevented. Then, when the near-end talker B speaks, the same switch is closed and a
symmetric one at the far-end talker As path is opened. However, echo suppressors can
clip speech sounds and introduce impairing interruption. For example, if talker B is
initially listening to talker A but suddenly wants to talk, it is quite likely that the switch
preventing talker As echo from being transmitted will not close quickly enough. This
will cause the far-end talker A to not be able to receive all the messages from the near-
end talker B. This deletion is noticed by talker A, encouraging him/her to stop and wait
for talker B to finish. The resulting confusion may stop the conversation entirely while
each party waits for the other to say something [1]. Therefore the best solution for
removing echoes is to use echo cancellers. Echo cancellers are described in chapter 3.
15
Doubletalk detector to
overdrive echo
suppessor
Hybrid
Speech
Detector
Echo
Suppessor
Destined for
Far-end
Talker
Arriving from Far-end
Echo
L1
L2
B Signal
Near-end
Talker B
Figure 2.4: Echo Suppressor at Near-end Talker B Path
2.4 Full-duplex Data Transmission between Voice-band Modems
The two-wire telephone line of a subscriber loop can be used for the transmission
of data through a modem. This can be accomplished either by using the entire bandwidth
of the wire or transmitting the data on a bandwidth that is slightly above the one used to
carry the speech signal. On an analog subscriber loop the speech signal occupies the
bandwidth between 300 to 3400 Hz. A higher bit rate of up to 16 kbps can be transmitted
by modulating the data signal onto a carrier signal at a band above 4000 Hz. Echo
cancellation is needed for full-duplex communication within the same bandwidth over the
subscriber loop as shown in Figure 2.5 where EC is the echo canceller, H is the hybrid,
RX is the receiver and TX is the transmitter.
16
Figure 2.5: Echo Cancellers at Modem Locations for Full-Duplex Voice-band Modems
Typically the echo cancellers must be placed at the line interface where the
hybrids connect the modem to the two-wire subscriber loop. Several problems are
associated with this type of application and some of them are given below.
It is not practical to freeze the adaptation algorithm during doubletalking in the
case of full-duplex operation since the echo paths characteristic is likely to
change during a lengthy communication session.
The far-end echo, which is returned from the far-end hybrid, must also be taken
into account. Therefore, the entire echo delay becomes very large, which is
unique to the echo cancellation at the station, or modem, location. If the circuit
includes a satellite communication networks four-wire link, the far-end echo will
be delayed for more than 500ms. In such a case two cancellers will be required.
One for the near-end and one for the far-end echo at the modems.
A significantly high level of echo cancellation is required. The data signal
coming from a far-end modem may be attenuated by 40 to 50dB. Therefore, the
near-end echo, which is returned from the first hybrid at the local station, can be
17
40 to 50dB higher than the desired signal. For reliable communication the echo
canceller must be able to attenuate the near-end echo by 50 to 60dB in order to
maintain the signal power approximately 10dB above the echo [2].
2.5 Short Distance Connections between Fixed and Cellular Lines
In digital cellular communication, the combination of channel coding, speech
coding and signal processing involves considerable delays. In most cases, the delays are
increased further by time division multiple access framing. The total one-way delay can
be from 30 to 120 msec. Figure 2.6 illustrates that only one echo canceller, (EC), facing
the local PSTN exchange, (LE), is required in a digital cellular to fixed telephone
connection. This is only possible if the cellular telephone is assumed to behave in a
perfect four-wire fashion with no significant acoustic cross talk echo between the
microphone and the earpiece of the cellular phone. However, under certain conditions,
the cross talk echo in cellular handsets is still noticeable by users. Hence, the echo needs
to be removed by cellular cross talk control devices [2].
18
Figure 2.6: Cellular to Fixed Telephone Connection
2.6 Teleconference/Videoconference Communication Systems
When the telephone connection is between hands-free telephones or between two
conference rooms, then an acoustic echo problem emerges that is due to the reflection of
the loudspeakers sound waves from the boundary surfaces and other objects back to the
microphone. This acoustic echo can be removed using an adaptive filter as illustrated in
Figure 2.7. The adaptive filter attempts to synthesize a model of the acoustic echo at its
output.
19
From Far-end
Talker
To Far-end Talker
Adaptive
Filter
Loudspeaker
Microphone
Enclosed Environment
E.G., A Room or Vehicle
x(n)
y(n) e(n)
) n ( y
Near-end Talker
v(n)
r(t)
Figure 2.7: Adaptive Acoustic Echo Cancellation in an Enclosed Environment
Adaptive acoustic echo cancellation is a more challenging problem than the
network echo cancellation for the following main reasons:
The impulse response of the acoustic echo path is several times longer,
between 100 to 500 msec. than that of the network echo path.
The characteristics of the acoustic echo path are more non-stationary due to
opening and closing of a door or movement of people inside the room while
the network echo path is almost stationary.
The acoustic echo path has a mixture of linear and nonlinear characteristics.
The reflection of acoustic signals inside a room is almost linearly distorted.
However, the loudspeaker does introduce nonlinearity. The main causes of
this nonlinearity are the suspension nonlinearity that affects distortion at low
20
frequency and the inhomogeneity of flux density that produces nonlinear
distortion at large input signal levels.
Due to the above mentioned reasons, the acoustic echo cancellers, (AECs), are
required to have more computing power in order to compensate for the longer impulse
response and to produce faster converging algorithms [2].
21
CHAPTER 3
THE ECHO CANCELLATION ALGORITHM
This chapter discusses the echo cancellation algorithm for a VoIP environment.
The basic idea behind the algorithm, its terminology, modes of operation and the
problems addressed by the algorithm are discussed in detail.
3.1 Basic Echo Canceller
A basic echo canceller used to remove echo in telecommunication networks is presented
in Figure 3.1.
Echo Canceller
Echo Path
Echo
Far-end Talker
Near-end
Talker
x(n)
r(n)
v(n) d(n) = r(n) + v(n)
+
+
+
_
) n ( y
e(n)
Figure 3.1: A Basic Echo canceller
22
The echo canceller mimics the transfer function of the echo path in order to
synthesize a replica of the echo. Then the echo canceller subtracts the synthesized replica
from the combined echo and near-end speech or disturbance signal to obtain the near-end
signal. However, the transfer function is unknown in practice. Therefore, it must be
identified. This problem can be solved by using an adaptive filter that gradually matches
its estimated impulse response, h
Figure 3.2: A Generic Adaptive Echo Canceller
23
The estimated echo, ) n ( y , is generated by passing the reference input signal, x(n),
through the adaptive filter, ) n ( h
Doubletalk
Detector
Nonlinear
Processor
Open
during
double talk
e(n)
d(n) = r(n) + v(n)
Figure 3.3: Echo Canceller with Doubletalk Detector and Nonlinear Processor
3.2 Components of an Acoustic Echo Canceller (AEC)
The previous section attempted to give some valuable first hand knowledge on the
functioning of a basic echo canceller. The following sections offer a detailed theoretical
25
and mathematical account of the three fundamental components of echo cancellers. The
three fundamental components that combine to form an echo canceller are:
1. Adaptive Filter
2. Doubletalk Detector
3. Nonlinear Processor
3.3 Adaptive Filtering
As previously demonstrated, the best solution for reducing the echo is to use some
form of adaptive algorithm. The theory behind such an algorithm and the reasons for
choosing that algorithm will be described in this section. Basically filtering is a signal
processing technique whose objective is to process a signal in order to manipulate the
information contained in the signal. In other words, a filter is a device that maps its input
signal into another output signal by extracting only the desired information contained in
the input signal. An adaptive filter is necessary when either the fixed specifications are
unknown or time-invariant filters cannot satisfy the specifications. Strictly speaking an
adaptive filter is a nonlinear filter since its characteristics are dependent on the input
signal and consequently the homogeneity and additivity conditions are not satisfied.
Additionally, adaptive filters are time varying since their parameters are continually
changing in order to meet a performance requirement. In a sense, an adaptive filter is a
filter that performs the approximation step on line.
26
3.3.1 Least Mean Square (LMS) Algorithm
The least mean square, (LMS), is a search algorithm that is widely used in various
applications of adaptive filtering. The main features that attracted the use of the LMS
algorithm are low computational complexity, proof of convergence in stationary
environments and stable behavior when implemented with finite precision arithmetic.
Figure 3.4 illustrates how such an algorithm works. A path that changes the signal x is
called h. Transfer function of this filter is not known in the beginning. The task of the
LMS algorithm is to estimate the transfer function of the filter. The result of the signal
distortion is calculated by convolution and is denoted by r. In this case r is the echo and
h is the transfer function of the hybrid. The near-end speech signal v is added to the echo.
The adaptive algorithm tries to create a filter w. The transfer function of the filter is an
estimate of the transfer function for the hybrid. This transfer function in turn is used for
calculating an estimate of the echo. The echo estimate is denoted byr .
x
w h
r
+
+
+
v
d= v+r
_
r
e v r r v r d + = + =
Figure 3.4: LMS Algorithm
27
The signals are added so that the output signal from the algorithm is
v + r r = v + e, (3.3)
where e denotes the error signal. The error signal and the input signal x are used for
estimation of the filter coefficient vector w. One of the main problems associated with
choosing the filter weight is that the path h is not stationary. Therefore, the filter weights
must be updated frequently so that the adjustment to the variations can be performed.
The filter is a FIR filter with the form
w = b
0
+b
1
z
-1
+ +b
L-1
z
(L 1)
. (3.4)
A perfect FIR filter is linear, time-invariant and stable in a BIBO sense.
However, in a real-time environment, linearity is never a possibility and the first criterion
is not fulfilled so the filter can never be perfect. Updating of the filter weights is realized
in accordance with
w(k + 1) = w(k) - g
w
(k) (3.5)
for k = 0,1,2, where g
w
(k) represents an estimate of the gradient vector and is the
convergence factor or step size.
3.3.1.1 Generic LMS Algorithm [3]
The general case of the LMS algorithm is presented in Figure 3.5.
w
LMS
x e +
_
d
y
w
Figure 3.5: Generic LMS Algorithm
28
Figure 3.5 shows that
e(k) = d(k) y (k) = d(k) - x
T
(k) w(k), (3.6)
where w(k) is a vector containing the filter weights [b
0,
b
1,
b
2,
, b
0
] and x(k) represents
the vector [x(n), x(n-1), , x(n-L)]
T
. L is the length of the adaptive filter.
The derivation of the gradient estimate g
w
(k) is provided next.
The Wiener solution is given by
w
o
= R
-1
p (3.7)
where
R = E [ x(k) x
T
(k)] (3.8)
and
p = E[d(k) x(k)], (3.9)
assuming d(k) and x(k) are jointly wide sense stationary. If good estimates of the matrix
R, denoted by R
(k)w(k)). (3.10)
One possible solution is to estimate the gradient vector by employing instantaneous
estimates for R and p, which are given by:
R
(k) = x(k) x
T
(k), (3.11)
and
p (k) = d(k) x(k). (3.12)
Then the gradient estimate g
w
(k) is given by
29
g
w
(k ) = 2d(k )x(k) + 2x(k)x
T
(k)w(k)
= 2x(k)(d(k) + x
T
(k)w(k))
= 2e(k)x(k). (3.13)
The resulting gradient-based algorithm is known. It minimizes the mean of the squared
error, as the least-mean square (LMS) algorithm, whose updating equation is given by
w(k+1) = w(k) + 2e(k)x(k). (3.14)
Table 3.1 presents the steps associated with the LMS algorithm in tabular form.
Table 3.1: LMS Algorithm
Initial Condition x(0) = w(0) = [0,,0]
T
For each instant of time, k = 1, 2, , compute
Filter output: y(k) =x(k)
T
w(k)
Estimation Error: ) k ( y ) k ( d ) k ( e =
Tap-Weight Adaptation: w(k+1) = w(k) + 2e(k)x(k)
3.3.2 Normalized Least Mean Square (NLMS) Algorithm [3]
There are a number of algorithms for adaptive filters, which are derived from the
conventional LMS algorithm. The objective of the alternative LMS-based algorithms is
either to reduce computational complexity or convergence time. The normalized LMS,
(NLMS), algorithm utilizes a variable convergence factor that minimizes the
instantaneous error. Such a convergence factor usually reduces the convergence time but
increases the misadjustment.
30
The updating equation of the LMS algorithm can employ a variable convergence
factor
k
in order to improve the convergence rate. In this case, the updating formula is
expressed as
2 ) ( ) 1 ( + = + k w k w
k ) k ( w ) k ( w ) k ( x ) k ( e + = ,
(3.15)
where
k
must be chosen with the objective of achieving a faster convergence.
The value of
k
is given by
k =
) k ( x ) k ( x 2
1
. (3.16)
Using the variable convergence factor the updating equation for the NLMS algorithm is
given by
w(k+1) =
) k ( x ) k ( x
) k ( x ) k ( e
) k ( w
T
+ . (3.17)
Usually a fixed convergence factor
n
is introduced in the updating formula in
order to control the misadjustment since all the derivations are based on instantaneous
values of the squared errors and not on the MSE. Also a parameter should be included
in order to avoid large steps when x
T
(k)x(k) becomes small. Then the coefficient updating
is by
w(k+1) = ) k ( x ) k ( e
) k ( x ) k ( x
2
) k ( w
T
n
+
+ (3.18)
31
Table 3.2 presents the steps associated with the NLMS algorithm in tabular form.
Table 3.2: NLMS Algorithm
Initial Condition
2 0
n
<
x(0) = w(0) = [0,,0]
T
= a small constant
For each instant of time, k = 1, 2, , compute
Filter output: y(k) = x(k)
T
w(k)
Estimation Error: ) k ( y ) k ( d ) k ( e =
Tap-Weight Adaptation:
w(k+1) = ) k ( x ) k ( e
) k ( x ) k ( x
2
) k ( w
T
n
+
+
3.4 Double Talk Detector (DTD)
An important characteristic of a good echo canceller is its performance during
double talk. The condition where both ends, the near-end and the far-end, are speaking is
referred to as double talk. If the echo canceller does not detect a double talk condition
properly the near end speech will cause the adaptive filter to diverge. Therefore, it is
important to have a reliable double-talk detector.
A DTD is used with an echo canceller to sense when the far-end speech is
corrupted by the near-end speech. The role of this important function is to freeze
adaptation of the model filter, h
T
x(n). (3.22)
This error signal is used in the adaptive algorithm to adjust the L taps of the
filter, h
. For simplicity it is assumed that the length of the signal vector, x, is the same as
the effective length of the echo path, h. When v is not present, with any adaptive
algorithm, h
.
2. The detection statistic, , is compared to a preset threshold, T, (a constant),
and double talk is declared if < T.
3. Once doubletalk is declared the detection is held for a minimum period of
time T
hold
. While the detection is held the filter adaptation is disabled.
4. If T consecutively over a time T
hold
the filter resumes adaptation while
the comparison of to T continues until < T again.
The hold time, T
hold
, in steps 3 and 4 is essential to suppress detection dropouts due to the
noisy behavior of the detection statistic. Although there are some possible variations
most of the DTD algorithms keep this basic form and only differ in how they form the
detection statistic.
An optimum decision variable, , for double talk detection should behave as
follows:
34
if v = 0 (doubletalk is not present), T
if v 0 (doubletalk is present ), < T
The threshold T must be a constant, independent of data. Moreover must be
insensitive to echo path variations when v = 0 [5].
In the following sections discussions of different DTD algorithms such as the
Geigel Algorithm, the Cross- correlation Method and the Normalized Cross-Correlation
Method are presented. The DTD algorithm used in this research was the Normalized
Cross-Correlation Method.
3.4.2 The Geigel Algorithm
One simple algorithm due to A. A. Giegel declares the presence of near-end
speech whenever
=
) k ( d
} ) 1 N k ( x , , ) k ( x max{ +
< T (3.23)
where N and T are suitably chosen constants. This detection scheme is based on a
waveform level comparison between the microphone signal, d, and the far-end speech, x,
assuming the near-end speech, v, in the microphone signal will be stronger than the echo.
The maximum, or norm, of the N most recent samples of x is chosen for the comparison
due to uncertain delay in the echo path. The threshold, T, is used to compensate for the
energy level of the echo path response, h, and is often set to for line echo cancellers
since the hybrid loss is typically approximately 6dB. However, for an AEC, it is not easy
to set a universal threshold that will work reliably in all the various situations since the
35
loss through the acoustic echo path can vary greatly depending on many factors. For N,
one easy choice is to set it equal to the adaptive filter length L [5].
3.4.3 The Cross Correlation Method
This method uses the cross-correlation coefficient vector between x and d as a
means for double talk detection. The cross-correlation coefficient vector between x and d
is defined by
c
xd
=
) n ( d { E )} n ( x { E
)} n ( d ) n ( x { E
2 2
(3.24)
=
d x
xd
r
(3.25)
= [c
xd,0
c
xd,1
c
xd,L-1
]
T
(3.26)
where E denotes the mathematical expectation and c
xd,I
is the cross-correlation coefficient
between x(n I) and d(n). The idea is to compare
=
xd
c (3.27)
= max
i , xd
c , i = 0,1,,L 1 (3.28)
to a threshold level T. The decision rule is then very simple. If T, double talk is not
present and if < T, double talk is present.
The fundamental problem with this method is that the cross-correlation coefficient
vectors are not well normalized. In general, it is assumed that 1. Therefore, if v =
0, it does not mean that = 1 or any other known value. The value of is not known in
general. The amount of correlation will depend greatly on the statistics of the signal and
36
of the echo path. As a result, the best value of T will vary from one experiment to
another. There is no natural threshold level associated with the variable when v= 0.
These complexities lead to another DTD algorithm, which is termed the Normalized
Cross-Correlation method. This method is simply a modification of the existing Cross-
Correlation Method [4].
3.4.4 Normalized Cross Correlation Method
In this method a new normalized cross-correlation vector between a vector x and a
scalar d is derived. Suppose that v = 0. In this case
R
dd
= E{d(n)d
T
(n)}
= H
T
R
xx
H (3.29)
where
R
xx
= E{x(n x
T
(n)}. (3.31)
Since
d(n) = H
T
x(n), (3.32)
R
xd
= R
xx
H, (3.33)
which allows R
dd
to be rewritten as
R
dd
= R
T
xd
R
-1
xx
R
xd
. (3.34)
In general, for v 0,
R
dd
= R
T
xd
R
-1
xx
R
xd
+ R
vv
(3.35)
where
R
vv
= E{v(n)v
T
(n)} (3.36)
37
is the covariance matrix of the near-end speech. The new decision variable is obtained by
dividing equation(3.35) by R
dd
and extracting the square root, which yields
= dd
1
xd
xx
1
xd
T
R R R R
(3.37)
=
xd
c (3.38)
where
c
xd
= R
-1/2
xx
R
xd
R
-1/2
dd
(3.39)
is the normalized cross-correlation vector between x and d. Substituting equation (3.33)
and equation (3.35) into equation (3.37) produces the decision variable, which is given by
=
v
2
xx
T
xx
T
H R H
H R H
+
. (3.40)
Equation (3.30) shows that for v = 0; = 1 and for v 0; < 1. Therefore, the
threshold value can be set tone (1). It should also be noted that is not sensitive to
changes of the echo path when v = 0 [4], [5].
3.5 Nonlinear Processor (NLP)
A nonlinear processor, (NLP), is a signal processing circuit or algorithm that is
placed in the speech path after echo cancellation in order to provide further attenuation or
removal of residual echo signals that cannot be removed completely by an echo canceller.
A non-linearity, a distortion, or an added noise signal are examples of signals that cannot
be fully cancelled by an echo canceller. Therefore, these signals are typically removed or
attenuated by a nonlinear processor.
38
3.5.1 Noise Gate as a NLP
In this research a noise gate was used as a NLP, which is a type of dynamic
processor. Noise gates belong to the family of expanders. As the name implies, it
increases the dynamic range of a signal such that low-level signals are attenuated while
the higher-level portions are neither attenuated nor amplified. The noise gate expansion
can be taken to the extreme where it will heavily attenuate the input or eliminate it
entirely leaving only silence.
While expanders are quite difficult to use effectively, noise gates are a very
common and effective way of reducing the apparent noise level in audio signals. The
noise gate offers a method of turning down the gain of an audio signal when the signal
level drops below some threshold value. The threshold value needs to be high enough
that only the background noise falls below but not so high that the audio signals are cut
off prematurely. Noise gates are most often used to eliminate noise or hiss that may
otherwise be amplified.
3.5.2 A Generic Expander
Figure 3.6 presents the basic structure of an expander.