Speech Processing Project
Speech Processing Project
ECE 5525
Osama Saraireh
Fall 2005
Dr. Veton Kepuska
Models Implemented In MATLAB
LPC Background
The speech signal is filtered to no more than one half the system
sampling frequency and then
A/D conversion is performed.
The speech is processed on a frame by frame basis where the
analysis frame length can be variable.
For each frame a pitch period estimation is made along with a
voicing decision.
A linear predictive coefficient analysis is performed to obtain an
inverse model of the speech spectrum A (z).
In addition a gain parameter G, representing some function of the
speech energy is computed.
An encoding procedure is then applied for transforming the
analyzed parameters into an efficient set of transmission
parameters with the goal of minimizing the degradation in the
synthesized speech for a specified number of bits. Knowing the
transmission frame rate and the number of bits used for each
transmission parameters, one can compute a noise-free channel
transmission bit rate.
LPC Theory
At the receiver, the transmitted parameters are decoded into
quantized versions of the coefficients analysis and pitch estimation
parameters.
An excitation signal for synthesis is then constructed from the
transmitted pitch and voicing parameters.
The excitation signal then drives a synthesis filter 1/A (z)
corresponding to the analysis model A (z).
The digital samples s^(n) are then passed through an D/A
converter and low pass filtered to generate the synthetic speech
s(t).
Either before or after synthesis, the gain is used to match the
synthetic speech energy to the actual speech energy.
The digital samples are the converted to an analog signal and
passed through a filter similar to the one at the input of the
system.
Speech production
^
e( n ) s ( n ) s ( n )
Cont’d
by minimizing the sum of the squared error we can determine the
pole parameters of the model. The result of differentiating the
sum above with respect to each of the parameters and equation
the result to zero, is a sep of p linear equations:
p
a
k 1
p (k )rss (m k ) rss ( m )
Rss a rss (m )
Auto-correlation
where Rss a is a pxp autocorrelation matrix, rss is a px1
autocorrelation vector, and a is a px1 vector of model parameters.
n 0
if the input excitation is normalized to unit energy by design, then
N 1 N 1 p
G 2
x ( n) e ( n) r
n0
2
n0
2
ss (0) a p (k )rss (k )
k 1
where G^2 is set equal to the residual energy resulting from the
least square optimization
once the LPC coefficients are computed, we can determine
weather the input speech frame is voiced, and if it is indeed
voiced sound, then what is the pitch. We can determine the pitch
by computing the
p following sequence in matlab:
p
re (n) ra (k )rss (n k ) ra (n) aa (k )a p (i k )
k 1 k 1
If the peak value is less than 0.25, the frame speech is considered
unvoiced and the pitch would equal to zero
The value of the LPC coefficients, the pitch period, and the type of
excitation are then transmitted to the receiver.
The decoder synthesizes the speech signal by passing the proper
excitation through the all pole filter model of the vocal tract.
Typically the pitch period requires 6 bits, the gain parameters are
represented in 5 bits after the dynamic range is compressed and
the prediction coefficients require 8-10 bits normally for accuracy
reasons.
This is very important in LPC because any small changes in the
prediction coefficients result in large change in the pole positions
of the filter model, which cause instability in the model.
This is overcome by using the PARACOR method .
Model LPC Vocoder (pitch detector)
8
0.3
0.2 6
0.1
4
0
2
-0.1
0
-0.2
-0.3 -2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4 4
x 10 x 10
Voice excited LPC Vocoder
Original speech signal reconstructed signal using voice Excited LPC vocoder
0.4 0.4
0.3 0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.1
-0.2
-0.2
-0.3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4 -0.3
x 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10