Speech Signal Processing Lab Work Book
Speech Signal Processing Lab Work Book
21EC3081
0
A.Y. 2023-24 LAB/SKILL CONTINUOUS EVALUATION
S.No Date Experiment Name Pre- In-Lab (25M) Post- Viva Total Faculty
Lab Program/ Data and Analysis & Lab Voce (50M) Signature
(10M) Procedure Results Inference (10M) (5M)
(5M) (10M) (10M)
1. Introductory to MATLAB
Speech acquisition and recording
2.
Non-Stationary Nature of Speech Signal
3.
Identification of Voice/Unvoiced/Silence
4.
regions of Speech
Different Sounds (Phonemes) In
5.
Language
Short Term Time Domain Processing of
6.
Speech
Fundamental frequency estimation in
7.
Speech signal
Format synthesis MFCC extraction from
8.
Speech signal
Linear Prediction Analysis
9.
Cepstral Analysis of Speech Signal
10.
LPCC extraction from Speech signal
11.
Speech Enhancement
12.
Speaker Recognition
13.
0
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
Description:
• Has variety of modern data structures and data types, including complex numbers.
Pre-Lab:
1. What are the basic elements of the MATLAB environment, such as the command window,
workspace, and editor?
2. How can you perform basic arithmetic operations and mathematical calculations using
MATLAB?
3. What are variables and how are they defined and used in MATLAB?
4. How can you create and manipulate arrays (vectors and matrices) in MATLAB?
5. What are some common built-in functions and operators in MATLAB, and how are they
used?
In-Lab:
Procedure:
To open MATLAB software, go to Run in the Start Menu and type MATLAB and
press Enter.
To write a MATLAB program and w script is to be opened.
The code should be written in script file and save it with an extension.m to
execute the program, click on the Run button or press F5.
If there are no errors in the program, the output waveform will be obtained. If
there are any errors those will be displayed on the command window.
Observe the output wave forms after run the program.
Program:
disp('Hello, MATLAB!');
a = 5;
b = 3;
sum = a + b;
difference = a - b;
product = a * b;
quotient = a / b;
title('Sine Wave');
This Section is meant for the students to collect, record the results generated during the
Program/Experiment execution as shown below. Include instructions on how to present the results,
such as creating tables, graphs, or visualizations.
Here, we can observe two frequency component i.e., +5Hz and -5Hz, of the input sinusoidal
wave.
Post-Lab:
1. What is the purpose of the "disp" function in MATLAB, and how is it used in the first
program?
2. Can you explain the order of operations in the basic arithmetic operations program? How
does MATLAB handle mathematical expressions?
3. In the second program, what does the "^" operator do, and how does it affect the vector?
4. How are the sum, mean, maximum, and minimum values calculated in the third program?
Are there any built-in functions used for these calculations?
5. Describe the steps involved in creating a simple line graph using the plot function in
MATLAB, as demonstrated in the fourth program.
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is speech acquisition, and why is it important in the field of speech and audio
processing?
2. What are the primary components of a speech acquisition system, and how do they work
together to capture and record speech?
3. What are the different types of microphones commonly used for speech acquisition, and
what factors should be considered when selecting a microphone for a specific application?
4. What is the sampling rate, and why is it essential in speech recording? How does the
sampling rate affect the quality and fidelity of recorded speech?
5. How does the bit depth or resolution of a recording device impact the quality of recorded
speech? What is the relationship between bit depth and dynamic range?
In-Lab:
Procedure:
The following code shows how to acquire and record a speech signal in MATLAB:
Code snippet:-
A: Record the Audio
close all;
% voice from microphone to the signal of MATLAB
n=50;
fs=16000; % sets n to be record time, sample rate is fs
channel = 1;
y = audiorecorder(n*fs,16,channel); % Record
pause % Pause
audioplayer(y,fs); % plays the voice of the record.
%figure;
%plot(y);
Course Title <TO BE FILLED BY CC> ACADEMIC YEAR: 2023-24
Course Code(s) <TO BE FILLED BY CC AND MUST INCLUDE ALL R,A,P CODES> Page 5 of 57
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
A:
SampleRate: 16000
BitsPerSample: 16
NumberOfChannels: 1
DeviceID: -1
CurrentSample: 513
TotalSamples: 80000
Running: 'on'
StartFcn: []
StopFcn: []
TimerFcn: []
TimerPeriod: 0.0500
B:
Post-Lab:
1. What is the role of a microphone in speech acquisition, and how does it convert sound
waves into electrical signals?
2. Can you explain the concept of sampling rate in the context of speech recording? How does
the choice of sampling rate impact the quality and fidelity of the recorded speech?
3. What is the significance of the bit depth or resolution in speech recording? How does it
affect the dynamic range and accuracy of the recorded speech?
4. Describe some common challenges or factors that can affect the quality of speech
recordings, such as background noise, microphone placement, and room acoustics.
5. What are some techniques or approaches that can be employed to minimize or mitigate
background noise during speech recording?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment
Pre-Requisites:
Pre-Lab:
1. What is meant by the non-stationary nature of signals, and why is it important to study?
2. What are some key characteristics of non-stationary signals, and how do they differ from
stationary signals?
3. How is speech produced by the human vocal tract, and what are the main components of
the speech production mechanism?
4. What are some factors that contribute to the non-stationarity of speech signals, such as
phonetic content, prosody, and speaker variability?
5. What are some techniques or methods used to analyze and model the non-stationary nature
of speech signals?
In-Lab:
Procedure:
subplot(3,1,2);
plot(f,20*log10(abs(Y(1:length(f)))+eps));
legend('Spectrum');
xlabel('Frequency (Hz)');
ylabel('Magnitude (dB)');
This code will load the speech signal speech.wav and plot the waveform of the signal. The
power spectrum of the signal will be calculated using the spectrogram() function and the
pwelch() function.
Data and Results:
1. Can you explain the difference between stationary and non-stationary signals, and how does
this distinction apply to speech signals?
2. What are some factors that contribute to the non-stationarity of speech signals, and how do
they impact the characteristics of the signal?
3. Describe the concept of short-term and long-term variability in speech signals. How do these
variations affect the analysis and processing of speech signals?
4. What are some common techniques or methods used to analyze and model the non-
stationary nature of speech signals?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What are the voice, unvoiced, and silence regions in a speech signal, and why is it important
to identify and distinguish them?
2. Can you explain the characteristics of a voiced speech signal and how it differs from an
unvoiced or silence region?
3. What are some techniques or methods commonly used for the identification of
voice/unvoiced/silence regions in speech signals?
4. How does the fundamental frequency (pitch) play a role in determining the voiced regions of
a speech signal? What are some algorithms used for pitch estimation?
5. Describe the concept of energy or power in a speech signal. How can energy-based
measures be utilized to identify unvoiced or silence regions?
In-Lab:
Procedure:
The following code shows how to identify the different regions of a speech signal in
MATLAB:
Code snippet:-
clear all;
close all;
[x,fs]= audioread('e_sound.wav');
figure(1)
ms1=fs/1000; % maximum speech Fx at 1000Hz
ms20=fs/50; % minimum speech Fx at 50Hz%
% plot waveform
t=(0:length(x)-1)/fs; % times of sampling instants
plot(t,x);
legend('Waveform');
xlabel('Time (s)');
This code will load the speech signal speech.wav and plot the waveform of the signal. The
power spectrum of the signal will be calculated using the spectrogram() function. The voice
regions of the signal will be identified using the voiced() function. The unvoiced regions of the
signal will be identified using the unvoiced() function. The voice and unvoiced regions of the
signal will be plotted
Data and Results:
Here, an audio vowel sound “e” is taken and segmented in different parts. Autocorrelation
function is measured for different signal segments and shown in figure 2. Thereafter, we determine
autocorrelation function of voiced and unvoiced signal.
Post-Lab:
1. Can you explain the significance of identifying voice, unvoiced, and silence regions in speech
signals? What kind of information or insights can be derived from this identification?
2. Describe the characteristics of a voiced speech signal and an unvoiced speech signal. What
are the key differences between them in terms of spectral content and waveform shape?
3. What are some common techniques or algorithms used for the identification of
voice/unvoiced/silence regions in speech signals? How do these methods work?
4. How does the concept of fundamental frequency (pitch) play a role in determining the
voiced regions of a speech signal? Can you explain the process of pitch estimation and its
importance in this context?
5. Discuss the role of energy or power in identifying unvoiced or silence regions of a speech
signal. How are energy-based measures employed in this process?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What are phonemes, and why are they important in the study of language and linguistics?
3. Can you explain the concept of minimal pairs and how they relate to identifying and
distinguishing different phonemes?
4. What are some techniques or methods used to analyze and classify phonemes in different
languages?
5. How does the International Phonetic Alphabet (IPA) facilitate the representation and
transcription of phonemes?
In-Lab:
Procedure:
1. Record your voice for a few seconds, pronouncing different vowels and
consonants.
2. Plot the waveform of the recorded speech signal.
3. Calculate the power spectrum of the speech signal.
4. Identify the different phonemes in the speech signal.
Program:
The following code shows how to represent phonemes in digital form in MATLAB:
Code snippet
clear all;
close all;
Fs=32000;
SS = audioread('male2.wav')
% Hamming window
winLen = 301;
winOverlap = 300;
wHamm = hamming(winLen);
% Framing and windowing.
sigFramed = buffer(SS, winLen, winOverlap, 'nodelay');
sigWindowed = diag(sparse(wHamm)) * sigFramed;
% Short-Time Energy calculation
energyST = sum(sigWindowed.^2,1);
% Time in seconds, for the graphs
t = [0:length(SS)-1]/Fs;
subplot(1,1,1);
plot(t, SS);
title('speech: Male Voice');
xlims = get(gca,'Xlim');
hold on;
% Delayed Short-Time energy due to lowpass filtering
delay = (winLen - 1)/2;
plot(t(delay+1:end - delay), energyST, 'r');
xlim(xlims);
xlabel('Time (sec)');
legend({'Speech','Short-Time Energy'});
hold off;
This code will load the speech signal speech.wav and plot the waveform of the signal. The power
spectrum of the signal will be calculated using the spectrogram() function. The MFCCs of the signal
will be calculated using the mfcc() function. The MFCCs of the signal will be plotted.
In this experiment, we have calculated short time energy of the recorded voice Phonemes. It
is shown in figure above.
Post-Lab:
1. Can you explain the concept of phonemes and their role in language? How do phonemes
differ from individual sounds or letters?
2. What is the importance of studying phonemes in linguistics and language analysis? How do
they contribute to our understanding of language structure and variation?
3. How are minimal pairs used to identify and distinguish different phonemes? Can you provide
an example of a minimal pair?
4. Describe the process of analyzing and classifying phonemes in different languages. What are
some common methods or techniques used in phonetics and phonology?
5. What is the International Phonetic Alphabet (IPA), and how does it assist in representing and
transcribing phonemes across languages? Can you provide an example of using IPA symbols?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is short-term time domain processing, and why is it commonly used in speech signal
analysis?
2. Can you explain the concept of windowing and its role in short-term time domain processing
of speech?
3. What are some common window functions used in speech signal analysis, such as
rectangular, Hamming, or Hanning windows? How do they affect the analysis results?
4. How is the concept of frame or window size related to short-term time domain processing?
What factors should be considered when choosing an appropriate frame size for speech
analysis?
5. Describe the process of computing the short-term energy of a speech signal using
windowing. What information does the energy provide about the signal?
In-Lab:
Procedure:
The following code shows how to analyze the non-stationary nature of a speech signal in
MATLAB:
Code snippet
The following code shows how to perform short-term time domain processing on a speech signal in
MATLAB:
Code snippet
SS = audioread('e_sound.wav')
% Hamming window
winLen = 301;
winOverlap = 300;
wHamm = hamming(winLen);
% Framing and windowing.
sigFramed = buffer(SS, winLen, winOverlap, 'nodelay');
sigWindowed = diag(sparse(wHamm)) * sigFramed;
% Short-Time Energy calculation
energyST = sum(sigWindowed.^2,1);
% Time in seconds, for the graphs
t = [0:length(SS)-1]/Fs;
subplot(1,1,1);
plot(t, SS);
title('speech: He took me by surprise');
xlims = get(gca,'Xlim');
hold on;
% Delayed Short-Time energy due to lowpass filtering
delay = (winLen - 1)/2;
plot(t(delay+1:end - delay), energyST, 'r');
xlim(xlims);
xlabel('Time (sec)');
legend({'Speech','Short-Time Energy'});
hold off;
This code will load the speech signal speech.wav and plot the waveform of the signal. The short-term
energy of the signal will be calculated using the energy() function. The short-term autocorrelation of
the signal will be calculated using the autocorr() function. The short-term power spectrum of the
signal will be plotted using the spectrogram() function.
In this experiment, we have calculated short time energy of the recorded voice with one Phonemes.
It is shown in figure above.
Post-Lab:
1. What is the purpose of short-term time domain processing in speech analysis? How does it
enable us to extract useful information from speech signals?
2. Can you explain the concept of windowing and why it is employed in short-term time
domain processing of speech? What are some common window functions used in this
process?
3. Discuss the significance of frame size and frame overlap in short-term time domain
processing. How do these parameters impact the analysis results?
Course Title <TO BE FILLED BY CC> ACADEMIC YEAR: 2023-24
Course Code(s) <TO BE FILLED BY CC AND MUST INCLUDE ALL R,A,P CODES> Page 22 of 57
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
4. Describe the process of computing short-term energy using windowing. How does the
energy information help in understanding the characteristics of a speech signal?
5. What is the autocorrelation function, and how is it used in short-term time domain
processing of speech? How does the autocorrelation function provide insights into the
periodicity or pitch of the speech signal?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is the fundamental frequency in speech signals, and what role does it play in speech
perception and analysis?
2. Can you explain the concept of pitch and how it relates to the fundamental frequency of a
speech signal?
3. What are some common methods or algorithms used for fundamental frequency estimation
in speech signals?
5. How does the concept of harmonics relate to the fundamental frequency? Can you explain
the relationship between harmonics and the periodic nature of speech signals?
In-Lab:
Procedure:
The following code shows how to analyze the non-stationary nature of a speech signal in
MATLAB:
Code snippet
clc
clear all
close all %Program to find autocorrelation of a speech segment
[y,Fs]=audioread('aa sound.wav');%input: speech segment
max_value=max(abs(y));
y=y/max_value;
t=(1/Fs:1/Fs:(length(y)/Fs))*1000;
subplot(2,1,1);
plot(t,y);
title('A 30 millisecond segment of speech');
sum1=0;autocorrelation=0;
for l=0:(length(y)-1)
sum1=0;
for u=1:(length(y)-l)
s=y(u)*y(u+l);
sum1=sum1+s;
end
autocor(l+1)=sum1;
end
kk=(1/Fs:1/Fs:(length(autocor)/Fs))*1000;
subplot(2,1,2);
plot(kk,autocor);
title('Autocorrelation of the 30 millisecond segment of speech');
auto=autocor(21:160);
max1=0;
for uu=1:140
if(auto(uu)>max1)
max1=auto(uu);
sample_no=uu;
end
end
pitch_period_To=(20+sample_no)*(1/Fs)
pitch_freq_Fo=1/pitch_period_To
This code will load the speech signal “aa sound.wav” and plot the waveform of the signal. The
autocorrelation of the signal will be calculated using the autocorr() function. The fundamental
frequency of the signal will be estimated using the autocorr() function. The estimated fundamental
frequency will be displayed.
Post-Lab:
1. Can you explain the concept of fundamental frequency in speech signals and its significance
in speech perception and analysis?
2. What is the relationship between fundamental frequency and the perceived pitch of a
speech signal?
3. Discuss some common methods or algorithms used for fundamental frequency estimation in
speech signals. How do they work, and what are their limitations?
5. What are the challenges associated with accurate fundamental frequency estimation in
speech signals, such as variations in pitch, background noise, or voice quality?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is speech format synthesis, and why is it important in speech signal processing?
2. Can you explain the concept of speech formats, such as formants, pitch, and duration, and
how they contribute to the perception of speech?
3. Describe the main steps involved in speech format synthesis, such as text-to-speech
conversion or speech reconstruction from acoustic features.
4. What are some common techniques or methods used in speech format synthesis, such as
rule-based synthesis, concatenative synthesis, or statistical parametric synthesis?
5. How does the choice of speech synthesis method impact the quality and naturalness of the
synthesized speech?
In-Lab:
Procedure:
The following code shows how to implement a simple speech synthesis algorithm in MATLAB:
Code snippet*
% Program to do text to speech.
% Get user's sentence
userPrompt = 'What do you want the computer to say?';
titleBar = 'Text to Speech';
defaultString = 'Hello KL! Goodmorning!';
caUserInput = inputdlg(userPrompt, titleBar, 1, {defaultString});
if isempty(caUserInput)
return;
end; % Bail out if they clicked Cancel.
caUserInput = char(caUserInput); % Convert from cell to string.
NET.addAssembly('System.Speech');
obj = System.Speech.Synthesis.SpeechSynthesizer;
obj.Volume = 100;
Speak(obj, caUserInput);
*https://in.mathworks.com/matlabcentral/answers/159113-text-to-speech-
synthesis-matlab-code
This code will load the text of the speech signal text and generate the phonemes of the speech signal
phonemes. The waveform of the speech signal will be generated using the concat_speech() function.
The synthesized speech will be played.
obj =
State: Ready
Rate: 0
Volume: 100
We observed that a line of text can be converted into Speak using TTS system with the help of the
Matlab.
Post-Lab:
1. Can you explain the concept of speech format synthesis and its significance in speech signal
processing?
2. Describe the main components or parameters involved in speech format synthesis, such as
formants, pitch, and duration.
3. Discuss the difference between rule-based synthesis, concatenative synthesis, and statistical
parametric synthesis in speech format synthesis. How do these methods generate
synthesized speech?
4. How does the choice of speech synthesis method affect the quality, naturalness, and
intelligibility of the synthesized speech?
5. Explain the concept of prosody in speech synthesis and its role in conveying intonation,
rhythm, and emphasis. How is prosody modeled and incorporated into the synthesized
speech?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is linear prediction analysis, and why is it widely used in speech signal processing?
2. Can you explain the underlying principle of linear prediction analysis and how it models
speech signals?
3. What are the main steps involved in performing linear prediction analysis on a speech
signal?
4. How does the prediction error or residual signal provide information about the
characteristics of the speech signal?
5. What are the parameters of linear prediction analysis, such as the order of the prediction
filter, and how do they impact the accuracy and quality of the analysis?
In-Lab:
Procedure:
The following code shows how to estimate the LPA coefficients of a speech signal in MATLAB:
Code snippet
clear all;
close all;
% Voiced sound
phons = audioread('e_sound.wav');
x = phons(36095:36700);
len_x = length(x);
% The signal is windowed
w = hamming(len_x);
wx = w.*x;
% Lpc autocorrelation method
order = 30;
% LPC function of MATLAB is used
[lpcoefs, errorPow] = lpc(wx, order);
% The estimated signal is calculated as the output of linearly filtering
% the speech signal with the coefficients estimated above
estx = filter([0 -lpcoefs(2:end)], 1, [wx; zeros(order,1)]);
% Display results
subplot(5,1,1);
plot([wx; zeros(order,1)],'g');
title('Phoneme /aa/ - Linear Predictive Analysis, Autocorrelation Method');
hold on;
plot(estx);
hold off;
xlim([0 length(er)])
legend('Speech Signal','Estimated Signal');
subplot(5,1,2);
plot(er);
xlim([0 length(er)])
legend('Error Signal');
subplot(5,1,3);
plot(linspace(0,0.5,513), 20*log10(abs(H)));
hold on;
plot(linspace(0,0.5,513), 20*log10(S(1:513)), 'g');
legend('Model Frequency Response','Speech Spectrum')
hold off;
subplot(5,1,4);
plot(lags, acs);
legend('Prediction Error Autocorrelation')
subplot(5,1,5);
plot(linspace(0,0.5,513), 20*log10(eS(1:513)));
legend('Prediction Error Spectrum')
This code will load the speech signal speech.wav and estimate the LPA coefficients of the signal using
a predictor of order 10. The LPA coefficients will be plotted, as well as the error between the
predicted values and the actual values.
Post-Lab:
1. Can you explain the concept of linear prediction analysis and its significance in speech signal
processing?
2. What is the underlying principle of linear prediction analysis? How does it model speech
signals?
3. Describe the main steps involved in performing linear prediction analysis on a speech signal.
4. How does the prediction error or residual signal obtained in linear prediction analysis
provide information about the characteristics of the speech signal?
5. What are the parameters involved in linear prediction analysis, such as the order of the
prediction filter? How do these parameters affect the accuracy and quality of the analysi
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What is cepstral analysis, and how does it differ from traditional spectral analysis of speech
signals?
2. Can you explain the concept of quefrency and how it is related to the time domain
representation of speech signals in cepstral analysis?
3. What are the main steps involved in performing cepstral analysis on a speech signal?
4. How is the cepstrum calculated from the speech signal, and what information does it
provide about the underlying source and filter characteristics of the speech?
5. Describe the process of liftering in cepstral analysis and its impact on the resulting cepstral
coefficients.
In-Lab:
Procedure:
The following code shows how to calculate the Cepstrum of a speech signal in MATLAB:
Code snippet
clear all;
close all;
Y=fft(x.*hamming(length(x)));
% plot spectrum of bottom 5000Hz
hz5000=5000*length(Y)/fs;
f=(0:hz5000)*fs/length(Y);
subplot(3,1,2);
plot(f,20*log10(abs(Y(1:length(f)))+eps));
legend('Spectrum');
xlabel('Frequency (Hz)');
ylabel('Magnitude (dB)');
% cepstrum is DFT of log spectrum
C=fft(log(abs(Y)+eps));
% plot between 1ms (=1000Hz) and 20ms (=50Hz)
q=(ms1:ms20)/fs;
subplot(3,1,3);
plot(q,abs(C(ms1:ms20)));
legend('Cepstrum');
xlabel('Quefrency (s)');
ylabel('Amplitude');
This code will load the speech signal speech.wav and calculate the cepstrum of the signal using a
cepstrum of order 10. The cepstrum will be plotted .
Post-Lab:
1. Can you explain the concept of cepstral analysis and how it differs from traditional spectral
analysis in speech signal processing?
2. Describe the main steps involved in performing cepstral analysis on a speech signal.
3. What is the quefrency domain in cepstral analysis, and how does it relate to the time
domain representation of speech signals?
4. How is the cepstrum calculated from the speech signal, and what information does it
provide about the underlying source and filter characteristics of the speech?
5. Explain the concept of liftering in cepstral analysis and its purpose in enhancing or modifying
the cepstral coefficients.
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
1. What are Mel Frequency Cepstral Coefficients (MFCC), and why are they widely used in
speech signal processing?
2. Can you explain the process of extracting MFCC from a speech signal? What are the main
steps involved?
3. What is the Mel scale, and how does it relate to the human perception of pitch and
frequency?
4. Describe the process of converting the linear frequency scale to the Mel scale in MFCC
extraction.
5. How are filterbanks used in MFCC extraction? What is their purpose, and how are they
designed?
In-Lab:
Procedure:
The following code shows how to extract the MFCCs of a speech signal in MATLAB:
Code snippet
clear all;
close all;
clc
clear all;
close all;
[s,fs]=audioread('aa sound.wav');
t=(1/fs:1/fs:(length(s)/fs))*1000;
s1= s;
figure(1)
max_value=max(abs(s1));
s1=s1/max_value;
plot(t,s1);
Ws=1024;
Ol=512;
L=floor((length(s)-Ol)/Ol);
N=12;
ccs=zeros(N,L);
for n=1:L
seg=s(1+(n-1)*Ol:Ws+(n-1)*Ol);
ccs(:,n)=mfcc_model(seg.*hamming(1,Ws),40,N,fs);
end
figure(2)
waterfall([1:L]*length(s)/(L*fs),[1:N],ccs)
xlabel('Time, s')
ylabel('Amplitude')
ylabel('Band')
zlabel('Amplitude')
Required Functions :
function band=spread_mel(hz_points,hz_c,hz_size,hz_max)
%hz_array is an array spaced in Hz
%hz_c is the current index
band=zeros(1, hz_size);
hz1=hz_points(max(1,hz_c-1)); %start
hz2=hz_points(hz_c); %middle
hz3=hz_points(min(length(hz_points),hz_c+1)); %end
for hi=1:hz_size
hz=hi*hz_max/hz_size;
if hz > hz3
band(hi)=0;
elseif hz>=hz2
band(hi)=(hz3-hz)/(hz3-hz2);
elseif hz>=hz1
band(hi)=(hz-hz1)/(hz2-hz1);
else
band(hi)=0;
end
end
This code will load the speech signal speech.wav and calculate the MFCCs of the signal using 26 mel
filters and 13 cepstral coefficients. The MFCCs will be plotted
1. Can you explain the concept of Mel Frequency Cepstral Coefficients (MFCC) and their
significance in speech signal processing?
2. Describe the main steps involved in extracting MFCC from a speech signal.
3. How is the Mel scale used to convert the linear frequency scale to the Mel scale in MFCC
extraction? What is the motivation behind using the Mel scale?
4. Explain the concept of filterbanks in MFCC extraction. How are the filterbanks designed, and
what role do they play in capturing the spectral information of the speech signal?
5. Discuss the role of the Discrete Cosine Transform (DCT) in MFCC extraction. How does the
DCT compress the cepstral coefficients obtained
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
2. Can you explain the difference between additive noise and non-additive noise in the context
of speech signals?
3. What are the main challenges in speech enhancement, such as noise reduction while
preserving speech intelligibility and quality?
5. How does the choice of a noise model influence the speech enhancem
In-Lab:
Procedure:
The following code shows how to implement a simple speech enhancement algorithm in MATLAB:
Code snippet
clear
close all
clc
[clean, fs] = audioread('jarvus.wav');
[noise] = audioread('jarvus_pub.wav');
output = noiseReduction_YW(noise, fs);
subplot(3,2,1)
plotWave_YW(0,clean,fs,'time',1);
title('Clean speech')
subplot(3,2,2)
plotWave_YW(0,clean,fs,'freq');
subplot(3,2,3)
plotWave_YW(0,noise,fs,'time',1);
title('Noisy speech')
subplot(3,2,4)
plotWave_YW(0,noise,fs,'freq');
subplot(3,2,5)
plotWave_YW(0,output,fs,'time',1);
title('Enhanced speech')
subplot(3,2,6)
plotWave_YW(0,output,fs,'freq');
Data and Results:
* https://medium.com/audio-processing-by-matlab/noise-reduction-by-wiener-filter-by-matlab-
44438af83f96
We have performed the experiment for de-noising the speech signal as shown in Figure.
Post-Lab:
1. Can you explain the concept of speech enhancement and its importance in speech signal
processing?
2. What are some common types of noise that affect speech signals, and how do they degrade
the quality and intelligibility of speech?
4. Discuss the challenges associated with speech enhancement, such as preserving speech
intelligibility while reducing noise artifacts or distortion.
5. How do the characteristics of the noise impact the choice of speech enhancement
algorithms? What factors should be considered when selecting an appropriate algorithm for
a given noise scenario?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Pre-Requisites:
Pre-Lab:
2. Can you explain the types of noise commonly encountered in speech signals, such as
additive noise, background noise, or reverberation?
3. What are the main challenges in speech enhancement, such as reducing noise while
preserving speech intelligibility and quality?
5. How does the choice of a noise model influence the speech enhancement algorithm? What
factors should be considered when selecting an appropriate noise model?
In-Lab:
Procedure:
The following code shows how to implement a simple speech recognition algorithm in MATLAB:
Code snippet
clear all;
close all;
[x,fs]=audioread('male2.wav',[24120 25930]);
% resample to 10,000Hz (optional)
x=resample(x,10000,fs);
fs=10000;
%
% plot waveform
t=(0:length(x)-1)/fs; % times of sampling instants
subplot(2,1,1);
plot(t,x);
legend('Waveform');
xlabel('Time (s)');
ylabel('Amplitude');
%
% get Linear prediction filter
ncoeff=2+fs/1000; % rule of thumb for formant estimation
a=lpc(x,ncoeff);
%
% plot frequency response
[h,f]=freqz(1,a,512,fs);
subplot(2,1,2);
plot(f,20*log10(abs(h)+eps));
legend('LP Filter');
xlabel('Frequency (Hz)');
ylabel('Gain (dB)');
Post-Lab:
1. Can you explain the concept of speech enhancement and its significance in speech signal
processing?
2. What are some common types of noise that affect speech signals, and how do they impact
the quality and intelligibility of speech?
4. Discuss the challenges involved in speech enhancement, such as the trade-off between noise
reduction and preserving speech quality.
5. Explain the concept of signal-to-noise ratio (SNR) and its role in evaluating the effectiveness
of speech enhancement algorithms. Are there any other objective or subjective metrics used
for assessing speech enhancement quality?
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.