Good Matter
Good Matter
Group 844
Abstract
Project group:
This project develops an algorithm to suppress the
844
noise from a noise contaminated speech signal for
Group members: mobile phone users. The one microphone scenario is
Raju Muchanthula chosen with an assumption that noise is additive and
that the speech signal and noise are uncorrelated.
The spectral subtraction technique based on mini-
Supervisors: mum statistics, is a noise power spectrum estimator
Kjeld Hermansen and subtracts noise spectrum from noisy speech sig-
nal spectrum to get the estimate of noise suppressed
Ole Wolf speech spectrum. The spectral subtraction, the spec-
tral manipulation and residual manipulation meth-
ods are implemented. The proposed algorithm is
capable of tracking different noise levels, results in
Publications: 5
improvement of signal to noise ratio and perceptual
Total Pages: 50
quality and the intelligibility is maintained.
Supplement: CD-ROM
The algorithm is analyzed for real time and tested
Finished: 30/05-2005
according to objective evaluation criteria. The algo-
rithm is chosen for mapping on an architecture of
TMS320C6713 DSP processor.
Preface
This document reports on the work of group 844 1 in the 8th semester. This report is
organized into two parts, main report and appendix. The main report contains seven
chapters. The first chapter starts as introduction and chapter 2 provides the overview of
problem modeling and problem analysis of the project is clearly outlined. Third chapter
is a review of requirements and specification. In fourth chapter, the design methodol-
ogy of the project using A3 model is described. In the chapter five, the algorithm of the
project is described. Finally chapter 6 and chapter 7, provides the implementation and
conclusion of the project. The second part of the report contains appendices with project
relevant, like linear predictive coding, speech and noise properties and about DSP proces-
sor TMS320C6713. This report ends with Matlab programs. All the associated material,
the Matlab code and C-programs along with project report are placed in the CD.
Aalborg,30th May 2005.
——————————-
Raju Muchanthula
2 Problem Analysis 2
2.1 Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Speech Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Process of Linear Predictive Coding . . . . . . . . . . . . . . . . . . . . 4
2.4 Noise Contaminated Signals . . . . . . . . . . . . . . . . . . . . . . . . 5
4 System design 11
4.1 The A3 -model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 Test bed design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Algorithm 17
5.1 Spectral Subtraction - Based on Minimum Statistics . . . . . . . . . . 17
5.2 Spectral Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Residual Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Implementation 24
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 DSP Architecture Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusions 37
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
C DSP 45
C.1 TMS320C6000 DSP platform . . . . . . . . . . . . . . . . . . . . . . . 45
C.2 Functional Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
C.3 Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
C.4 TMS320C6713 DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
C.5 C6713 Memory and Peripherals . . . . . . . . . . . . . . . . . . . . . . 47
C.6 TMS320C6713 DSP Starter Kit (DSK) . . . . . . . . . . . . . . . . . . . 47
C.7 Tools and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
C.8 Algorithm Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
C.9 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Bibliography 52
The goal of this project is to develop an algorithm that can reduce the background noise to
facilitate conversation in noisy environment. The proposed system uses one microphone
for the input of the system. An illustration of the system can be seen in the figure 2.1.
The input signal to the microphone is noisy speech. In the filter, an estimation of the
background noise is subtracted from the signal picked by the microphone, so the output
is noise suppressed speech signal. In this project the major factor but maintaining the
perceptual quality and noise reduction are the main tasks.
In the following section, the speech generation and analysis are described before analyz-
ing the problem. The production of speech can be modeled with three blocks [3, Ch.2 and
3]; the generator, the vocal tract and the radiation. These blocks are depicted in figure 2.2.
Physical speech production originates in the lungs, where the airflow begins. The actual
sound is formed while air flows through the larynx and vocal tract. The larynx consists of
the cricoids cartilage, vocal folds and arytenoids cartilage. The vocal tract can be divided
User Application
into three areas: oral pharynx, nasal cavity and mouth. The tongue, lower jaw and lips
have the most effect on the form of the vocal tract to produce distinct sounds.
Impulse train
Generator
Depending on the activity status of the vocal folds, the sounds can be roughly divided into
two extreme cases [6]. First, the vocal folds can open and close periodically, generating
in this way a train of pulses (glottal pulse). This gives the sound its voiced nature, i.e.
periodicity in time and harmonic structure in frequency. Second, the vocal folds can just
be open, with the airflow forming turbulence between the folds and the sound very noise-
like. Unvoiced sounds are formed this way. The excitation of the voiced sounds is usually
modeled by the train of pulses, whereas unvoiced sounds use random noise excitation.
The figure 2.2, is an illustration of the applicability of linear prediction for speech model-
ing [5]. The vocal tract model H(z) and lip radiation model R(z) are excited by a discrete
glottal excitation signal. For voiced speech the source excitation is an impulse train gen-
erator driving a glottal shaping filter and using local pitch period estimation. For unvoiced
speech the source excitation comes from a random noise generator.
Now, the knowledge about the generator can be used to analyze the speech. Production of
speech can be modeled as an AR-model, as the vocal tract can be modeled as an all-pole
filter [11]. Since the signal from the generator is assumed white, the inverse of the vocal
tract transfer function can be used to whiten the signals. The whitening filter can be found
using linear prediction, which is covered in detail later in this chapter and in appendix A.
For voiced sounds the glottis introduces two extra poles. It is desired to remove the
effect of these poles, in order to focus the parameters of the AR-model on the vocal tract.
The radiation can be modeled as a single-zero filter, which causes one of the poles to be
canceled. A preemphasis filter with the transfer function
with 0.9 < z0 < 1 will cancel the other pole [3, p. 330]. This filter must be applied prior
to modeling the vocal tract.
The approach described here is model-based or parametric in contrast to a non-parametric
approach. The advantage of this approach is that it is possible to manipulate e.g. the
power spectrum or autocorrelation function without destroying the speech estimate since
the information about the pitch period is still maintained in the residual.
Residual Estimated
Speech signal speech
Analysis filter Synthesis
s(n) filter ŝ(n)
Transmission
channel
LPC LPC
Coefficients Coefficients
Linear prediction analysis of speech is one of the most important speech analysis tech-
nique. The assumption is that speech is short time or locally stationary for analysis. The
analysis is an extraction of prediction Coefficients, i.e an determination of formants cre-
ated by vocal tract. As shown in the figure 2.4 the speech is filtered with a prediction
error filter. The predictor is to predict the actual sample based on linear combination of
the past samples. If the predictor is optimal, the residue is spectrally white. The predictor
optimization is described in appendix A.
Linear Ŝ(n) −
S(n) ∑ e(n)
Speech predictor Residue
+
Till now the focus was on analyzing the signal with the purpose of identifying the pa-
rameters of a system, satisfying AR constraint by minimizing the prediction error. If the
prediction error is ’white’, then the estimated signal is good fit, so the synthesized signals
having similar statistical properties as the original one. Then exciting the synthesis filter
with the system function H(z) using a white noise signal, the filter’s output will have a
spectrum close to the original signal. The process is shown in the figure 2.5.
1 1
H(z) = = (2.2)
1 + ∑M
i=1 ai z
−i A(Z)
In the equation 2.2, ai are the LP coefficients found from the original signal. The transfer
function 1/A(z), G is a gain factor and it is excitated by the input signal e(n), and thereby
producing the speech signal s(n).
G
e(n) A(z) s(n)
If the speech signal is corrupted by noise, the all-pole model no longer applied to approx-
imate the speech spectrum estimation closely and simply. Assume that the speech signal
is corrupted by an additive and uncorrelated background noise, then the signal model be-
come a pole-zero model, is also called as autoregressive moving average (ARMA) model.
In a system with pole-zero model the estimation of the parameters is inherently nonde-
terministic and nonlinear [6]. There are few solutions for pole-zero model like Gauss-
Newton method, but it is not guaranteed for the optimal solution [2, p.326]. So spectral
subtraction is considered as a solution to suppress the background noise.
LPC
the non stationarity of the process. Based on few assumptions speech is considered as
locally stationary for analysis [12, p. 208]. The average noise power is approximately
the same prior to speech activity as during speech activity. With these assumptions, the
estimate of the signal spectrum Pŝ (ω) is obtained by [9, p.114]:
where Px (ω) is the spectrum of the noise contaminated signal and Pn (ω) is the noise
spectrum estimated during no speech activity. It is difficult to remove entire noise by this
method, as the assumptions are not entirely correct. Some of the noise components are
not additive resulting in a correlation between the speech and noise in someway. Due
to these reasons, the performance of the method depends on the estimation of the noise
spectrum [9].
The classical spectral subtraction method does not deal with signal phase information.
Same noisy signal phase is used to reconstruct the estimate of the speech signal. Even the
manipulation of the magnitude spectrum distorts the pitch period. Thus, another model is
chosed which is based on linear prediction. This model uses the envelope of the amplitude
spectrum without considering the phase information. The proposed method i.e., spectral
subtraction using LPC is shown in figure 2.6.
The system is only considered as a noise suppression device, which gives the two main
objectives:
The boundaries specified for SNR are between 0 dB and 35 dB, as mentioned in [3,
p.586]. The limit is considered so as the sysytem can function efficiently.
The system described in the Problem Analysis utilizes a model based description of the
speech production. It is thereby possible to set up specific requirements to the parameters
in this model. With starting point in figure 2.6, requirements to the different parameters
will now be presented.
• Frame Size: The frame size had been set to 25ms, if the analysis frame shorter than
20ms results in roughness, while increasing the frame size decreases the musical
overlap
frame
shift/burst
bursts
noise considerably. If the frame size is too long, it results slurring [12]. The frame
and burst are shown in the figure 3.1.
• Window Selection: The window determines the portion of the speech signal that
is to be processed by zeroing out the signal outside the region of interest. The
ideal window frequency response has a very narrow main lobe which increases
the resolution, and decreases the side lobes (or frequency leakage). Since such
a window is not possible in practice, a compromise is usually selected for each
specific application.
There are many possible windows, such as rectangular, hanning, hamming window.
The rectangular window has the highest frequency resolution due the narrowest
main lobe, but has the largest frequency leakage. Due to the high frequency leakage
produced by the larger side lobes, rectangular windowed speech looks more noisy
than hamming or hanning windowed speech. This undesirable high frequency leak-
age between adjacent harmonics tends to offset the benefits of the flat time domain
response of the rectangular window. As a result, rectangular windows are not usu-
ally used for speech spectral analysis. On the other hand, the trapezoidal windows
such as hamming and hanning window has smaller frequency leakage but lower
resolution. So, they produce a smoother spectrum than the rectangular window [4].
Since LP model is determined on frame-by-frame basis, original signal should be
segmented into frames. The segmented signal is also required to be windowed
The purpose of the analysis filter is to analyze the signal and identifying the parameters
of a system, satisfying AR constraint by minimizing the prediction error. If the prediction
error is ’white’, then the estimated signal is good fit, so the synthesize signals having
similar statistical properties as the original one. The objective of the analysis filter is to
cancel the characteristics of the vocal tract in the observed signal and by it reproducing an
estimate of the excitation signal at the output of the filter. This can also be considered as
canceling the formants in the observed signal, which is the same as minimizing the error
signal. In order to do this, it is necessary to have enough coefficients in the analysis filter
to model the formants. In [3, fig. 5.7c] it is seen, that with a sampling frequency of 8
kHz, the residual energy is not further minimized when the prediction order exceeds 10.
Furthermore, an algorithm for calculating the LP coefficients should be suited for real-
time implementation. A natural choice would therefore be the Levinson-Durbin algo-
rithm, which also applies to the LPC-block.
In the case of a noise contaminated speech signal as the observed signal, the AR-model
analysis filter will have a noisy residual as an output. Manipulation of this noisy residual
should lead to an emphasis of the voiced segments i.e. the impulses in the residual.
The length of the FFT must be long enough to give a sufficient frequency resolution. The
preffered length is 256 samples. The minimum FFT order corresponding to a given frame
size is adequate [12].
This block is the main SNR improving part of the system and the ideal goal is to entirely
remove the noise spectrum from each analyzed frame, which is not possible. The output
should not be a time estimate of the speech signal, which is the classical usage of spectral
subtraction, but instead an estimated power spectrum of the speech signal to be used in
the IFFT-block in order to obtain the autocorrelation. From the autocorrelation, the LP
coefficients are calculated.
The block should be able to estimate the noise spectrum under different types of noise
and varying noise levels.
Since it is not possible to entirely remove the noise from the observed signal, manipula-
tion of the estimated speech power spectrum is necessary. The results of this manipulation
should be an estimated speech power spectrum in which the randomly varying parts due
to noise is further reduced.
After spectral subtraction, power spectrum manipulation, and autocorrelation (ifft), a es-
timated time wave of speech is reconstructed using the manipulated LPC coefficents and
manipulated noisy residual. Then the synthesis filter’s output will have a PSD close to
the original signal as long as the prediction order 10 is adequate.
The purpose of the system test is to test the system as a black box. There will be per-
formed a test of the individual parts of the system.
The system test will be carried out with a noise contaminated speech signal. The test will
show whether the requirements are maintained, with respect to SNR, intelligibility.
The Rugby model is a conceptual frame work, in which designs, design processes are
expressed in order to analyze the problem. This model is used to evaluate the design
covers all the domains at the various abstraction levels. Rugby model has four domains
for design process, namely Computation, Communication, Data, and Time.
As shown in the figure 4.1 rugby model has starts with an idea or project proposal (higher
abstraction level) and goes through four domains with different abstraction levels to de-
velop the (lower abstraction level) final system. All abstraction levels are treated these
domains resulting in the final design. In this project the system design is based on the
A3 -model as shown in the figure 4.2. The model takes into consideration that the algo-
rithm may be modified before moving to the architecture domain. Furthermore the move
from the algorithm domain to the architecture domain i.e the feed back from the fixed
architecture can be performed several times.
Time
Computation
Final
Idea System
Communication
Figure 4.1: The Rugby meta model in relation with the A3 -model.
Application
1
Algorithm 2
Architecture
4.2 Application
The application of the project is to develop a noise suppression algorithm, which is must
be an adaptive for give architecture. The system black box is illustrated in the figure 4.3.
While considering the timing constraints, the system is expected to run as a real time sys-
tem. For that latency must not exceeds 30ms, in order to the audio or visual requirement.
The input is a noise contaminated speech signal, x[m], and the output is an estimate of the
noise free speech signal, ŝ[m].
4.3 Algorithm
The computational complexity and desired output are considered as the most important
factors for choosing the algorithm. Based on the problem analysis and the application
description, the system is partitioned into processes. The processes are groupings of
functionalities, which are defined in the problem analysis.
The system consists of different parts, that will be executed with different inputs. One
part of the system is running as sample based execution, while the other two parts run
as burst based and frame based execution. In the sample based part, circular buffers are
available, providing the possibility of the frame overlap. The overview of the algorithm
is shown in the figure 4.5. These different processes of the algorithm to be analyzed step
by step. The processes and the interface between these are described in short terms in
table 4.1.
The next step in the design process is a partitioning of the processes into functions. It is
defins which processes are concurrent and therefore can be executed in parallel. It also
describes how parameters are be passed between the functions.
Synthesis filter Estimated residual, ê[m] and LP Estimated speech samples, ŝb [m] Synthesizes the speech signal from the resid-
coefficients,ˆa ual and the LPC-coefficients.
Power spectrum Frame of data, frame Power spectrum, P x (w) Calculates the power spectrum of the speech
frame.
Spectral subtraction Power spectrum, P x (w) Noise suppressed power spec- Suppresses noise in the power spectrum.
trum, Ps̃ (w)
Spectrum manipulation Noise suppressed power spec- Estimated speech power spec- Estimates the power spectrum using a model-
trum, Ps̃ (w) trum, Pŝ (w) based approach.
IFFT Estimated speech power spec- Autocorrelation function, rŝ (τ) Calculates the autocorrelation from the power
4.3. ALGORITHM
trum, Pŝ (w) spectrum.
d
LPC Autocorrelation function, rŝ (τ) LP coefficients, â Calculates the LP coefficients from the auto-
correlation function.
Deemphasis filter Estimated burst of speech sam- Estimated speech samples, ŝ[m] The inverse of the preemphasis filter.
13
Requirements
Specifications
Matlab model
C−model
TMS320C6713 DSK
4.4 Architecture
When mapping the algorithm to the architecture, several aspects need to be considered,
like how to implement the chosen in an optimal manner for given architecture on different
abstraction levels. The level of implementations are shown in the figure 4.4.
The basic algorithm implemention and testing to be done in MATLAB. Then the matlab
code is converted into C. Considering the architecture the C and Matlab functions could
be optimized with minimum changes. To facilitate functional debugging and easy to
access to the internal variables of the algorithm. This should be ensure compatibility with
the C-compiler included in CCS, allowing the code to be transferred with alternations.
In the next section a test bed will be designed, so that there is a proper way of testing the
system.
To be able to test the algorithms, and thereby the system, a testbed design environment
must be designed.
The design of the testbed is based on stepwise refinement, so that the system at the highest
level of abstraction is tested first. A single process or function will be tested at a time,
keeping the number of adjustable parameters to a minimum. It must be emphasized,
that each process and function will have to fulfill the requirements in the requirements
specification.
Windowed frame ^a
Group 844: 30th May 2005
LPC
Windowing
a
Burst based execution
Noisy Estimated
Burst residual Residual residual Synthesis Deemphasis
Analysis filter ~e[m] manipulation filter filter
^~
e[m] S^b [m]
Framming
Sample based execution
Estimated
Speech Preemphasis speech
+ filter ^S[m]
noise x[m] X[m] ^
S[m]
in_buf out_buf
15
4.5. TEST BED DESIGN
First a Matlab model will be developed, so that a fully functioning mathematical model
can be tested. The Matlab model gives the possibility to implement one module at a time,
replacing the Matlab functions with C-functions and test the C-functions in connection
with the already working Matlab-functions. Thereby, only one function is changed at a
time, making it possible to track errors down to a single function.
The processes and their functions will be tested on the fixed architecture. This will be
performed by writing some test-data in memory, and then running the function. The test
data will be the same data that is used when testing the Matlab functions.
After the testbed has been designed, the design of each process in the algorithmic and ar-
chitecture domain will be performed. The test of the process will performed in connection
with this design.
The performance of speech signals degrades in the presence of noise. To reduce the
degradation, a model is proposed to represent the spectral changes of speech signal uttered
in noisy environments. The spectral subtraction method is a well-known noise reduction
technique. The standard algorithms for spectral subtraction usually need a voice activity
detector (VAD), such that the noise spectrum can be estimated during non-speech activity.
But VAD’s performance is often degraded considerably under noise conditions. Most
implementations and variations of the basic technique advocate subtraction of the noise
spectrum estimate over the entire speech spectrum. In general noise is mostly colored
and does not affect the speech signal uniformly over the entire spectrum. This chapter
describes an alternative robust algorithm, which make use of minimum statistics while
estimating the noise spectrum. The described algorithm is based on the article [8].
5.1.1 Overview
Spectral subtraction is a noise power estimator and subtraction rule which translates the
SNR into a spectral weighting factor, such that low SNRs are attenuated and high SNRs
are not modified. Spectral subtraction method is computationally efficient.
The assumption is that the power spectrum of signal is corrupted by uncorrelated noise,
is equal to the sum of a speech spectrum and noise spectrum.
5.1.2 Terminology
Below are some of the more common terms that are encountered when discussing the
spectral subtraction algorithm.
• Spectral floor: The spectrum components of the processed signal below a certain
lower bound is called spectral factor.
• Musical noise: The noise which is generated in the spectral subtraction process.
• Short time power: The estimation of the power using the short-time.
• Broadband noise: In the frequency domain, a broadband noise has a continuous
spectrum, that is present at all frequencies in a given range. This type of sound
often referred to as noise because it usually lacks a discernible pitch.
• Comb filter: The filter has multiple pass bands and stop bands. It has a frequency
response with a periodic function of w and a period 2π/L, where L is a positive
integer.
In the spectral subtraction, the minimum of subband noise power with a finite window to
estimate the noise floor. A periodogram of Px (λ, k) is shown in the figure 5.1, the short
time power estimate of noisy speech signal shows distinct peaks and valleys.
−3
x 10
16 Px
P
min
P
n
14
12
10
Power [W/Hz]
Figure 5.1: Periodogram, Px (λ, k), for fs = 8kHz (NFFT = 256, k = 25) of the input signal
.
The basic idea behind the algorithm is an utilization of these peaks and valleys. The
peaks are assumed to speech activity and the valleys are used to obtain the estimation
of subband noise power. To obtain the noise power estimates the data window for the
minimum search must be large enough to bridge peaks of speech activity.
A block diagram describing the implemented algorithm for spectral subtraction is shown
in figure 5.2.
x[m] = s[m] + n[m] xw [m] |X(λ, k)| Ŝ(λ, k)
Windowing FFT ×
Q(λ, k)
kosub
Pn (λ, k)
Noise Computation
power of spectral
estimation Pn (λ, k) weighting
The signal x[k] is assumed that the sampled and windowed signal, x[k] is sum of a zero
mean speech signal s[k] and a zero mean noise signal n[k] is:
Further assumption is that, speech and noise are statically independent. The data window
w[k] and the short-time fourier transform of x[k] is a given by,
where λ indicates the time index, k indicates the frequency bin index and the discrete
frequencies.Ωk = N2πk
FFT
, k = 0, 1, . . . , NFFT − 1. By computing the FFT of individual win-
dowed frames of length NFFT . Then the short-time power spectrum from the equation 5.1
is derived as:
where Px (λ, k), Ps (λ, k) and Pn (λ, k) are the short time power spectrums of the noise con-
taminated speech, the speech and the noise signal respectively.
To obtain |X(λ, k)|2 , the smoothed estimate of short time power of x w [k] with a first order
recursive network is given by,
Where γ is smoothing constant (γ . 0.9). Now the aim is to estimate the short time
magnitude spectrum of the clean speech signal, Ŝ(λ, k). This can be done by using
following subtraction rule:
(
ksubf · Pn (λ, k) if |X(λ, k)| · Q(λ, k) ≤ ksubf · Pn (λ, k)
Ŝ(λ, k) = (5.5)
|X(λ, k)| · Q(λ, k) otherwise
ksubf is a spectral floor constant. It prevent the spectral components of Px (λ, k) from
descending below the lower bound ksubf · Pn (λ, k). The variable gain factor Q(λ, k) is
given by
s
Pn (λ, k)
Q(λ, k) = 1 − kosub (λ, k) (5.6)
|X(λ, k)|2
An estimation of the short time noise power Pn (λ, k) is done from the short time power
Px (λ, k), which is estimated by filtering the instantaneous short time power |X(λ, k)| 2 , to
obtain a smoothed estimate i.e.,
komin is a factor to compensate the bias of the minimum estimate. The minimum is ob-
tained as a comparison between a minimum, calculated every M frames, of the last D
frames of Px (λ, k) given by
and the actual value of PX (λ, k). The minimum is obtained as the minimum of the last W
minimums, that is D = W · M. By splitting up D the minimum is obtained after only M
frames instead of after D frames.
To control the oversubtraction factor k osub (λ, k), the SNR in each subband is computed. If
the SNR is high, the oversubtraction level is low and vice versa. An estimate of the SNR
is:
Px (λ, k) − min(Pn (λ, k), Px (λ, k))
SNR(λ, k) = 10 log (5.10)
Pn (λ, k)
K osub
5
4
3
2
1
Large values of kosub leads to speech distortion. As shown in the figure 5.3, the optimal
value of kosub is found for best noise reduction with less amount of musical noise is
smaller for higher SNR. For kosub = 1, the SNR ≥ 35 dB and as increasing the k osub value
the SNR will be SNR ≤ −5 dB.
The SNR is calculated for each short-time frame, so the k osub value also varies for frame
to frame. The actual value of kosub used in equation 5.6 is given by:
After spectral subtraction, the estimated speech power spectrum still contains noise. This
noise can be suppressed by spectral manipulations. The manipulation can be done in two
ways:
• Spectral smoothing
• Floor setting
The noise components, which abruptly comes up for a short time in the power spectrum
are observed as noise. This type of noise can be reduced by spectral smoothing in time
domain. For smoothing, the power spectrum is filtered with a lowpass filter. The designed
lowpass filter should be able to suppress the short time varying components or noise
components in the power spectrum without disturbing the formant frequencies and short
time stationarity. The smoothed spectrum should be able to produce noise suppressed lPC
coefficients for synthesis. The smoothing in the time domains is capable of changing or
modifying the amplitudes of the spectrum. This process can be done by filtering.
If the estimated speech spectrum contains rapidly or slowly varying frequency compo-
nents compared to original speech spectrum, in such cases it is difficult to smooth the
spectrum. This effect can be controlled by spectral smoothing in frequency domain. So
the estimated speech spectrum is needed to be filtered with a lowpass filter.
By filtering the spectrum, it should capable to model the envelope of the estimated speech
spectrum. In the filtering process, it is needed to maintain the frequency components
should not be delayed. The same formats should retained even after smoothing.
40
35
30
25
20
15
r
10
−5
−10
−400 −300 −200 −100 0 100 200 300 400
Lag [samples]
In the spectral subtraction method, subtracting an estimate of the noise power spectrum
from the noisy speech power spectrum results in estimated speech power spectrum and
setting the negative differences to zero [12].
As described in system design, the algorithm is divided in two parts. Till now, the discus-
sion and description was about first part, spectral subtraction and spectral manipulation
with frame based execution. Now, the residual manipulation using burst based execution.
The ultimate challenge is trying maximum possible ways to suppress the background
broadband noise and increase the SNR in which residual manipulation also plays major
role.
The corrupted harmonics in noisy residue are regenerated by filtering with a comb filter.
But the comb filter needs a pitch detector for detecting the pitch period in the residue.
−50
−60
−70
−90
−100
−110
−120
0 1 2 3 4 5 6 7
Frequency (kHz)
This leads to a problem as sometimes it detects wrong pitch period and even it is quite
complex to construct a comb filter to suppress the noise level in the residue. So in order
to avoid the pitch detection, another method called autocorrelation is considered to reach
the desired response.
The autocorrelation of a noisy residual signal is shown in the figure 5.4. The duration
between the impulses is considered as pitch period. And the corresponding power spec-
trum is shown in the figure 5.5. The power spectrum clearly shows that the frequency
components are amplified. Now, by filtering the spectrum, the noise level of the residual
signal is reduced.
6.1 Overview
The clean speech signal of male speaker who speaks a sentence “watch the log float in
the wide river”. The additive noise signal is a pink noise. The figure 6.1 shows noise free
signal and noisy speech signal.
The circular buffers are used in the algorithm for real-time implementation. Digital signal
processors which support circular buffers automatically generate and increment pointers
for memory accesses which wrap to the beginning of the buffer when its end is reached,
thus saving the time and instructions otherwise needed to ensure that the address pointer
stays within the boundary of the buffer, and speeding the execution of repetitive DSP
algorithms [14].
0.5 0.5
Pink noise
Speech
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−0.1 −0.1
−0.2 −0.2
−0.3 −0.3
−0.4 −0.4
−0.5 −0.5
0 2 4 6 8 10 12 0 2 4 6 8 10 12
4 4
x 10 x 10
0.2 0.2 1
Hanning window
0.8
0.1 0.1
0.7
0.05 0.05
0.6
0 0
0.5
−0.05 −0.05
0.4
−0.1 −0.1
0.3
−0.15 −0.15
0.2
−0.25 −0.25 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
(c) A frame of a speech signal (d) Windowing a frame with a hanning window
0.15 0.2
0.15
0.1
0.1
0.05
0.05
0
0
−0.05
−0.05
−0.1
−0.1
−0.15
−0.15
−0.2
−0.2 −0.25
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
The input signal of the system is a vector. The vector is made into frames with a length
of 160 samples and 20 percent overlap between two consecutive frames. A frame of the
input signal is shown in the figure 6.1.
The frame and burst relation is shown in the figure 3.1. 20 percent overlap of the frame
is used as a delay for the analysis. The same delay is maintained for the synthesis.
Each frame is windowed using hanning window. A windowed frame with well shaped
boundaries is shown in the figure 6.1.
The pre-emphasis filter boosts the harmonic components of the input signal in order to
get a better estimation. The purpose of the pre-emphasis filter is shown in the figure 6.2.
Before filter
5
After filter
4.5
4
Px (λ = 100, f ) ×10−4
3.5
2.5
1.5
PSfrag replacements
1
0.5
but gave poor results. The linear prediction to the signal spectrum with a 11-pole and
sampled at 8 kz and residual signals are shown in figure 6.3. The signal power spectrum
compared with residual power spectrum and auto-correlation of the residue are shown
in 6.3.
0.2
−20
0.15
−25
0.1
−30
PSD[dB]
0.05
−35
0
−40
−0.05
−45 −0.1
(a) Linear prediction to the signal spectrum with a (b) A residual signal from the 25th burst
11-pole and the signal sampled at 8kz
−4 −6
x 10 x 10
10 8
Pe
Px
6
5
5
Power [W/Hz]
r
0 3
−5
0
−60 −40 −20 0 20 40 60 0 500 1000 1500 2000 2500 3000
Lag [samples] f
(c) Auto correlation of the residual signal from a (d) Signal power spectrum vs residual power spec-
35th burst trum
The main objective of the power spectral subtraction is to estimate the noise spectrum,
Pn (λ, k) and subtract it from the input signal power spectrum, Px (λ, k) to obtain the esti-
mate of speech power spectrum, Pŝ (λ, k). The process should be able to track different
types and levels of noise spectrum is shown in figure 6.4. The implemention of algo-
rithm is described as flow chart is shown in figure 6.9. For this different parameters are
considered while implementing.
After the subtraction, as mentioned in algorithm, the spectral subtraction, spectral ma-
nipulation is necessary as mentioned in chapter 5. The manipulation is done in three
steps. First step the envelope estimation, which estimates the envelope of the smoothed
spectrum inorder to describes the LPC model. Different types of filters are designed for
envelope estimation and smoothing. The envelope estimation a kind of low-pass filtering
is shown in 6.4.
The first order recursive filter is used for smoothing in frequency domain. The smooth-
ing in frequency domain is carried out to suppress the frequency components which are
introduced in envelope estimation is shown in 6.4. The next approach is smoothing in
time domain is to smoothing the Periodogram is shown in figure 6.4, is similar to that of
frequency smoothing.
6.1.8 IFFT
From the manipulated estimate of the speech power spectrum, the IFFT should calculated
the autocorrelation for the signal frame. In figure 6.5, the calculated autocorrelation for
the frame index λ = 95 is shown.
−4
x 10
−6 Averaged noise power spectrum
x 10 14 P
3 x
Pmin
P
n
12
2.5
10
Power [W/Hz]
8
1.5
6
1 4
2
0.5
(a) Averaged estimated noise spectrum (b) Estimated noise floor of noisy speech signal
( f = 8khz, NFFT = 256))
−5 −5
x 10 x 10
6 1.6
envelope Smoothed
Original
1.4
5
1.2
4
1
Power[w/hZ]
power[w/hZ]
3 0.8
0.6
2
0.4
1
0.2
0 0
4000 4500 5000 5500 6000 6500 7000 7500 8000 0 500 1000 1500
f[hZ] f[hZ]
(c) Envelope estimation of estimated signal power (d) Smoothed estimated power spectrum
spectrum
−5
x 10
−110 5
4.5
−120
4
3.5
Power Spectral Density (dB/Hz)
−130
Ps(λ=97, k)[w/khz]
−140 2.5
−150
1.5
1
−160
0.5
−170 0
0 1000 2000 3000 4000 5000 6000 7000 0 500 1000 1500 2000 2500 3000 3500 4000
Frequency (Hz) f[khZ]
(e) Periodogram of estimated power spectrum (f) Estimated speech power spectrum after spectral
manipulation.
Group 844: 30th May 2005 29
Figure 6.4: Results of Spectral subtraction and spectral manipulation
6.1. OVERVIEW
10
4
−2
−4
PSfrag replacements
−6
Figure 6.5: Autocorrelation of an estimated speech frame calculated from the manipulated speech
power spectrum estimate.
The residual signal is result of the LPC analysis filter. The residual signal would be zero,
as the prediction filter would exactly match all the poles there in. This phenomenon does
not occur often, so a 10th-order LPC analysis filter is used in this algorithm implementa-
tion. The residue of analysis filter of an order of 10 is shown in the figure 6.1.
Thus, the AR-modeling is not sufficient if the input signal is noise contaminated. So
noise suppression is necessary in order to get better synthesized speech. The burst based
execution is carried out to reduce the complexity in the process of noisy residue.
As proposed in algorithm, the autocorrelation method is considered to find the pitch pe-
riod in the residue. and corresponding power spectrum is calculated. Now a first order
lowpass filter is used to filter the spectrum. The filter responses is shown in the figure 6.6.
After finding the LPC coefficients from the autocorrelation, the estimated speech wave
form is reconstructed using the results of analysis filter, the manipulated LPC coefficients
and manipulated residual signal. It is called inverse system identification. The results
of the of the synthesis filter is shown reffig:estimate. The spectrograms of clean speech
signal, noisy spech signal and estimated speech signal are shown in the figure 6.7.
The software is basically developed as a Matlab functions in ANSI-C. This allows easier
access to the internal variables, and makes it possible to verify the results of the individual
functions. The conversion of the software is based on simulations in Matlab. In the ANSI-
40 −55
35 −60
30
−65
25
20
−75
15
r
−80
10
−85
5
−90
0
−5 −95
−10 −100
−400 −300 −200 −100 0 100 200 300 400 0 1000 2000 3000 4000 5000 6000 7000
Lag [samples] Frequency (Hz)
(a) Auto correlation of the residual signal (b) Periodogram of residual Signal
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Frequency
Frequency
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
Time 4
x 10 Time x 10
4
(a) Spectrogram of a noise free speech signal. (b) Spectrogram of a noisy speech signal.
Figure 6.7: Spectrogram of speech, noisy speech and estimated speech signals
MIC
INPUT
OUTPUT
EDMA DSP
Controller Processor
M EMORY
The basic architecture and functional units with functionalities of the DSP processor are
explained in appendix C. The different blocks on TMS320C6713 DSK board which are
used for LPC analysis implementation are shown figure 6.8. The processor and EDMA
(Enhanced Direct Memory Access) can directly access the memory. EDMA uses buffers
to read and process the data. The multi-channel serial port is used to communicate with
external peripherals like AIC23 codec and LED’s. The address and data buses are used to
communicate between the blocks [15].
The ANSI-C version, first executed on Matlab, then moved onto the 6713DSK with pos-
sible changes. The example programs were as for the reference to run on the 6713DSK.
To optimize the ’C’code on DSK kit, one should have good knowledge of memory allo-
cation, external peripheral port addresses and specific order of instruction execution. The
code is optimized with the help of example programs from code composer studio. The
Matlab code could not be worked out, so special Matlab functions were written like han-
ning window, framing, Levinson Durbin recusion algorithm, all pole filter, all zero filter.
Finally the program is loaded onto the DSK to test and confirm the operation of C pro-
gram.
Initialization
frame-counter=0
Px = Pmin
if
Yes Compute Pmin
Sm ≤ Subf ∗ Pn
if
Compute Q factor Yes
Px ≤ Pmin
Osub=7-SNR*1/s
Pmin = Px
No
No
Osub=0
Pn =Omin*Pmin
if
SNR ≥ 35 Yes
if Yes
No Pn =0
Osub=7
if Yes SNR=35
SNR ≤ 0 No
Compute SNR
Start
X=S+n
Circular buffer
Preemphasis filter
Sampling / Framing
Burst
Power spectrum
LPC Analysis filter (1) Spectral subtraction
Spectral manipulation
~
Residue manipulation ( e ) LPC (2)
Deemphasis filter
End
The C program is loaded onto the DSK to confirm the functionality as mentioned in one-
day workshop student guide from TI. A new project directory is created on starting a new
project. A new .cdb (Configuration database) file was created to control the range of CCS
capabilities.
The created all files are added to project directory. On getting errors, break points were
used for the correction of the code until the execution was successful.
After execution, the results were stored in memory were compared with that of Matlab
results.
The Matlab converted C files and written C code along with Matlab code are placed in
CD-Rom.
7.2 Conclusion
To conclude from the implementation results, the spectral subtraction method was used
for noise reduction in speech. It is used as a good noise power estimator. More analysis
was conducted using this technique and resulted in effective outcome. This technique
helped in increase the signal-to-noise ratio (SNR) and perceptual quality. Spectral ma-
nipulation was also performed, but not much analysis was done, as not much change was
observed in the result.
Residual manipulation played major role in order to maintain the intelligibility. This was
performed with the burst based execution in order to produce better reconstruction of
speech signal at synthesis filter.
It is recommended that new techniques are needed in order to suppress the noise in the
residual signal along with burst based execution and also in spectral manipulation.
The C-code was written in two methods. The first method: Matlab converted C code was
able to built on the board but while execution was terminated. The second method: The
written C code gave good performance compared to that of Matlab generated code. The
code was built and executed successfully.
Linear Predictive Coding (LPC) is a method used for estimating speech parameters like
pitch, formants, spectra, and vocal tract. LPC technique is based on linear predictive
analysis. LPC removes redundancy in speech signals by forming/exploiting correlation
in the speech. The analysis results in a set of AR coefficients, also known as whitening
filter coefficients. The speech signal is passed through the whitening filter to obtain the
residual signal, which looks like spectrally white. Finally the residual is encoded and
transmitted together with the AR coefficients. This chapter describes the basic overview
of linear prediction and different methods of approach to find the prediction coefficients
and error minimization.
The basic principle of linear prediction is to predict a future value of a stationary discrete-
time stochastic process, from a set of past or observed values [1, e.3,p.241]. The block
diagram of linear prediction is shown in figure A.1.
Prediction Residual
Linear x̂[n] -
x[n] z −1 predictor Σ e[n]
+
As shown in the figure x[n] is a signal, considered as an input to a linear filter. The output
of the predictor is denoted by x̂[n]. It is the estimation of the signal x[n]. The error e[n]
is the difference between x̂[n] and x[n]. This is called linear estimation problem. In such
a problem the selection of a filter is designed to minimize the error e[n]. Now the aim
is to estimate future values from observed values of x[n], then the estimation called as
linear prediction. To estimate future values of x[n], the estimation is just obtaining the
linear prediction coefficients which are used in speech coding. A mathematical model of
linear prediction is described in-order to obtain the predictor coefficients and minimize
the prediction error.
Consider a system with output x[n] with some unknown input u[n]. A linear prediction of
order p is to estimate the value x[n] as a linear combination of the P previous observations
x[n − 1], x[n − 2],.... x[n − p]. So the following equation is [6]:
p q
x[n] = − ∑ ak x[n − k] + G ∑ bl u[n − l], (A.1)
k=1 l=0
where ak , bl , and gain G are parameters of the system. Equation A.1 indicates that
output x[n] is a linear function of present, past inputs and past outputs. Then the signal
x[n] is predictable from linear combination of past outputs and inputs. The equation A.1
in frequency domain the transfer function is H(z) of the system.
X(z) B(z)
H(z) = =G , (A.2)
U(z) A(z)
where X(z) and U(z) are the z-transform of x[n] and u[n]. H(z) is the pole-zero model.
The pole-zero model is also called as autoregressive moving average (ARMA) model.
A system with pole-zero model the estimation of the parameters is inherently nonde-
terministic and nonlinear. But the all-pole model can approximate the speech spectrum
estimation closely and simply. So all-pole model is prefered in LPC [6].
A system only with poles, that is bl = 0, 1 ≤ l ≤ q, and b0 = 1 in equation A.1, then the
system is referred to the all-pole model or auto-regressive (AR) model. Then the equation
A.1 reduce to:
p
x[n] = − ∑ ai x[n − i] + Gu[n], (A.3)
i=1
where G is the gain factor. The transfer function H(z) reduce to an all-pole transfer
function.
X(z) 1 1
H(z) = =G p =G . (A.4)
U(z) 1 + ∑i=1 ai z −1 A(z)
If the current input signal is unknown, which is the common case in most of the ap-
plications, then it can only approximately predict the signal x[n] from a linear weighted
summation of past samples. This approximation is denoted as x̂[n] and is expressed as:
p
x̂[n] = − ∑ ak x[n − k]. (A.5)
k=1
The error e[n] between the actual value x[n] and the predicted value x̂[n], is:
p
e[n] = x[n] − x̂[n] = x[n] + ∑ ak x[n − k], (A.6)
k=1
where e[n] known as the residual. The prediction coefficients a k should be selected to in
order to minimize the total squared error. To obtain the coefficients the method of least
squares is applied. Then the equation is:
h p i2
En = ∑ e2n [n] = ∑ x[n] + ∑ ak x[n − k] , 0 ≤ n ≤ N − 1. (A.7)
n n k=1
dEn
= 0, 1 ≤ i ≤ p. (A.8)
dai
p
∑ ak ∑ x[n − k]x[n − i] = − ∑ x[n]x[n − i], 1 ≤ i ≤ p. (A.9)
k=1 n n
Equation A.9 is called least squares terminology. The equation A.9 forms a set of p linear
equations, that can be solved to find the predictor coefficients a k , 1 ≤ i ≤ p and also to
minimizes the En (equation A.7). The total minimum squared error denoted by E p , is
obtained by substituting the equation A.7 in A.9.
p
E p = ∑(x[n])2 + ∑ ak ∑ x[n]x[n − k]. (A.10)
n k=1 n
Now, the coefficients ak are obtained by solving equation A.10. Equation A.10 can be
solved using two different approaches. One is the autocorrelation and the another one is
covariance method.
In this method, the assumption is that the error in euqtion A.7 is minimized over the
infinite duration. The signal is considered a finite duration (N samples) to find the au-
tocorrelation [6]. The autocorrelations are calculated from the signal using available N
samples. Now, to solve the prediction coefficients a k , the equations A.9 and A.10 in-
terms of autocorrelation functions are:
p
∑ ak R(i − k) = −R(i). (A.11)
k=1
p
E p = R(0) + ∑ ak R(k), 1 ≤ i ≤ p. (A.12)
k=1
p
∑ ak φki = −φi0 , 1≤i≤ p (A.14)
k=1
p
E p = φ00 + ∑ ak φ0k , (A.15)
k=1
where
N−1
φik = ∑ X[n − i]x[n − k], (A.16)
n=0
where φki is a symmetric covariance matrix. By solving the above covariance normal
equation set in matrix form as:
φ1,1 φ1,2 φ1,3 . . . φ1,p a1 φ1,0
φ2,1 φ2,2 φ2,3 . . . φ2,p a2 φ2,0
φ . . . φ3,p a3 φ3,0
3,1 φ3,2 φ3,3 = (A.17)
. .. .. .. ..
..
..
. .
. . . . . .
φ p,1 φ p,2 φ p,3 . . . φ p,p ap φ p,0
The matrix equation A.17 is symmetric but not Toeplitz matrix (the elements across the
diagonal are different). This method cannot guarantee stable filters. There is no efficient
algorithm to make the matrix inversion for covariance matrix equation.
The Levinson Durbin (L-D) algorithm describes a direct method for computing the pre-
dictor coefficients ak and mean square error En for a order p by solving the augmented
Wiener-Hopf equations. The method is recursive in nature and by using Toeplitz struc-
ture of the correlation matrix (equation A.13) of a tap input of the filter. This known as
Levinson Durbin algorithm [1, e.3,p.198].
The Toeplitz matrix equation A.13 is solved by using L-D algorithm, which is efficient
method to exist to invert such matrices. To determine the values of the linear prediction
parameters at iteration i by aik and the error Eni for 1 ≤ i ≤ p. The recursive procedure is:
Initially E0 = R(0) and a0 = 0.
(i−1)
R(i) + ∑i−1
j=1 a j R(r − j)
ki = . (A.18)
E i−1
The parameters ki are known as the reflection coefficients.
aii = ki . (A.19)
(i−1) (i−1)
aij = a j + ki ai− j , 1 ≤ j ≤ i−1 (A.20)
The equations A.19 and A.21 are solved recursively in each iteration. In each cycle the
iteration i, the coefficients aik for 1 ≤ i ≤ p gives the optimal pth order linear predictor and
error En is reduced by a factor of (1 − ki2 ) [6]. The filter with L-D algorithm produces is
in minimum phase and stable. L-D algorithm is computationally efficient.
B.1 Speech
Speech is used to communicate information from a speaker to a listen. The human speech
production begins with an idea or thought that the speaker wants to convey to a listener.
The speaker produces an acoustic sound pressure wave by a series of neurological process
and muscular movements that is received by a listener auditory system.
Speech signals are non-stationary and at best they can be considered as quasi-stationary,
over the short segments, typically 5-30 msec. The statistical and the spectral properties of
speech are thus defined over a short segment. Speech can be generally classified as voiced,
unvoiced or mixed. Voiced speech is quasi-periodic in the time domain and harmonically
structured in frequency domain while unvoiced speech is random like and broadband. In
addition, the energy of voiced segments is generally higher than the energy of unvoiced
segments.
The short time spectrum of voiced speech is characterized by its fine and formant struc-
ture. The fine harmonic structure is a consequence of quasi-periodicity of speech and
may be due to the vibrating vocal cords. The frequency of periodic pulses is referred to
as the fundamental frequency or pitch. The formant structure (Spectral envelope) is due
to the interaction of the source and vocal tract. The spectral envelope is characterized by
a set of peaks which are called formants. The formants are the resonant modes of vocal
tract. These formants are quite important both in speech synthesis and perception.
B.2 Noise
The noise can be defined as complex sound waves that are aperiodic, that is, the sound
waves with irregular vibrations and no definite pitch. In other words, noise is defined as
a unwanted signal that interferes with the detection of or quality of another signal [16].
The noise is classified into different colors according to their spectral properties. White
noise power density is constant over a finite frequency range. The next most commonly
used color is pink noise. Its frequency spectrum is not flat, but has equal power in bands
that are proportionally wide. Pink noise is perceptually white. That is, the human auditory
system perceives approximately equal magnitude on all frequencies. The power density
decreases by -3 dB per octave (density proportional to 1/ f ). Brown noise is similar
to pink noise, but with a power density decrease of -6 dB per octave with increasing
frequency (density proportional to 1/ f 2 ) over a frequency range which does not include
DC. There are also many "less official" colors of noise such as red, orange, green and
black.
The C6000 DSP platform offers fast DSPs running at clock speeds up to 1 GHz. The
platform consists of the TMS320C64x and TMS320C62x fixed-point generations as well
as the TMS320C67x floating-point generation. The C6000 DSP platform’s performance
ranges from 1200 to 8000 MIPS (Million Instructions per Second) for fixed-point and 600
to 1350 MFLOPS (Mega Floating Point Instructions per Second) for floating point [15].
Basic C6000 CPU Architecture:
• Functional Units
• Register File
• Memory and Peripheral
It contains eight independent functional units as shown in the figure C.1. All eight func-
tional units can receive their own 32 bit instruction on every cycle, i.e. it can execute
eight instructions in parallel.
• .D unit (.D1,.D2) A 32-bit loads and stores, add, subtract, linear and circular ad-
dress calculations.
• .M unit (.M1,.M2) It performs 16x16-bit integer or 32x32-bit floating point multi-
ply operations in the hardware.
• .L unit (.L1,.L2) A 32/40-bit arithematic and compare, 32 bit logic operations. It
performs converstion operations.
Memory
A0 B0
A1 .D1 .D2 B1
.M1 .M2
....
....
.L1 .L2
A15 B15
Controller / Decoder
• .S unit (.S1,.S2) A 32/40-bit shifts and 32-bit bit fields, branches, constant gen-
eration compare, reciprocal and reciprocal square-root, absolute value operations,
conversion. Register transfers to/from control register file.
The variables operated upon by the CPU are stored in register file. There are two reg-
ister files. Register file "A" (A0-A15/31) and Register file "B" (B0-B15/31) of 16 or 32
registers each, depending upon which C6000 CPU is using.
The TMS320C6713 is the floating-point DSP generation in the TMS320C6000 DSP plat-
form. The C6713 DSP also features a two-level cache and VLIW (very-long instruction
word) architecture. The C6713 DSP operating at 225 MHz, delivers up to 1350 mil-
lion floating-point operations per second (MFLOPS), 1800 million instructions per sec-
ond (MIPS), and with dual fixed-/floating-point multipliers up to 450 million multiply-
accumulate operations per second (MMACS). The C6713 DSP has sufficient bandwidth
to support all 16 serial data pins transmitting a 192 kHz stereo signal.
The c6x family of DSPs has a single large 32-bit address space. The address space is
split between on-chip memory, on-chip peripheral registers and external memory. All
memory is byte addressable and program code and data can be mixed freely. The C6713
also has 4kB program and data caches to improve performance when accessing external
code and data [15]. The figure C.2 memory map for the C6713 DSK showing how the
address space is used. The internal memory starts at the begining of the address space
with most of the either reserved or used for peripheral registers. The EMFI starts at
address 0x80000000 and spans the next 1 GB of the address space. It is divided into 4
equally sized regions each with a dedicated chip enable signal (CE0-CE3). The on-board
memory, programmable CPLD (complex programmable logic device) registers and add-
on daughter cards are all connected through EMIF (external memory interfaces).
The TMS320C6713 DSP Starter Kit (DSK, developed for high precision applications
based on TMS320C6000 floating point DSP generators. Like audio, medical imaging,
test and instrumentation. The C6713 DSK includes 8MB of on-board SDRAM and an
emulation header and 12C interfaces.
The DSK includes the Fast Run Time Support libraries and utilities such as Flash burn
to program flash, Update Advisor to download tools, utilities and software and a power
on self test and diagnostic utility to ensure the DSK is operating correctly. The hardware
• C6713 DSP Development Board with 512K Flash and 8MB SDRAM
• High-quality 24-bit stereo codec
• Four 3.5mm audio jacks for microphone, line in, speaker and line out
• Expansion port connector for plug-in modules
For DSP product development, the TMS320 DSP family is supported by user eXpress-
DSP Real-Time Software Technology that includes Code Composer Studio integrated de-
velopment environment, DSP/BIOS Real-time software kernel, TMS320 DSP Algorithm
Standard.
The TMS320 DSP Algorithm Standard is a single, standard set of coding conventions
and application programming interfaces (APIs) for algorithm creators. The standard in-
cludes algorithm programming rules that enable interoperability between different types
of algorithms.
C.9 Terminology
Below are some of the more common terms that encounter when discussing the TMS320
DSP Algorithm Standard.
• Algorithm: A module of code that consumes a data stream, processes it, and out-
puts a resultant stream. Examples include vocoders, modems, audio compression,
video decompression, etc.
• Reference Framework : The "glue" code that holds together the drivers, the al-
gorithms, resource managers, and DSP kernel. Reference Frameworks start out
as application-agnostic. Upon the addition of application-specific algorithms, the
Framework takes on an application-specific nature.
• DSP Kernel: A low-level software layer that provides hardware abstraction and
manages low level physical resources. It provides threading; interrupt support,
pipes, signals, and several other functions. In addition, DSP/BIOS (Basic Input
Output System) offer data logging and statistical accumulation that enable real-
time analysis of the system.
• Application: It depends on the use of some or all of the other components. If a user
writes all the code from scratch including a kernel, algorithms, and a framework,
then the entire software system may be described as the application. However, in
an environment where DSP/BIOS, a reference framework, and COTS (Commercial
off-the-shelf) algorithms have been deployed, the application programmer uses the
APIs (Application Program Interface) for the controlling framework.
[2] Ljung:, "System identification. Theory for user", Edition.2, Prentice Hall, 1999,
ISBN:0-13-881640-9.
[3] John R. Deller, Jr. and John H.L Hansen and John Proakis:, Edition.2, "Discrete-
Time Processing of Speech Signals", Wiley-Interscience-IEEE, 0-7803-5386-2.
[4] A.M.Kondoz:, Digital Speech, Coding for low bit rate communication sys-
tems,Edition.2, Wiley and Sons, 1999, ISBN: 0471623717.
[6] John Makhoul:, Linear Prediction: A Tutorial Review, IEEE Transactions on Signal
Processing, 1975, Vol.SIG-63, No. 4, April
[8] Martin, R.: "Spectral Subtraction Based on minimum Statistics", EUSIPCO-94, Ed-
inburgh, Scotland, 13.-16 September 1994, pp. 1182-1185
[9] Steven F. Boll: "Suppression of Acoustic Noise in Speech Using Specral Subtrac-
tion", Edition.2, IEEE Transactions on Acoustics, Speech, and Signal Processing,
volume.27, pages.113-120, april,1979, Isbn:0096-3518.
[13] Axel Jantsch and Shashi Kumar and Ahmed Hermani:, "The Rugby Model: A Con-
ceptual Frame for the Study of Modelling, Analysis and Synthesis Concepts of Elec-
tronic Systems", Design, Automation and Test in Europe Conference and Exhibition
1999. Proceedings, pages.256-262, 1999.