Audio Signal Processing Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing Audio Signal Processing
BY:-
1. VIPLAW KUMAR (BE/10077/17)
2. AHMAD FAROOQUE (BE/10086/17)
3. FARAZ AHMAD KHAN (BE/10110/17)
Submitted to:-
DEPARTMENT OF ELECTRICAL AN
AND
D ELECTRONICS
ENGINEERING
Page 1 of 31
CERTIFICATE
Page 2 of 31
ACKNOWLEDGEMENT
Page 3 of 31
ABSTRACT
Page 4 of 31
INTRODUCTION
In most devices (like Smartphone and computer) speech signal is
captured by the microphones of a speech communication device is
often distorted by interfering noise source as well as room
surrounding reverberation. Such degradation may reduce the
listening comfort and speech intelligibility and further processing
quality. This degradation can be reducing by using filtering method
IIR, FIR. The target of a future speech enhancement algorithm
should be a reduction of unwanted background noise and room
reverberation. Each method has its own advantage and efficiency
depend on the type of noise. Unwanted noise consume energy and
deteriorates the audibility of the signal there are a variety of method
to reduce noise both analog and digital. Each method has its own
advantages and efficiency is often dependent upon the type of noise.
The noise suppression has application is virtually all field of
communication (channel equalization, radar signal processing etc.)
and other field. In this paper our experiments on filtering technology;
IIR filter, FIR filter. The output signal from the filter is compare with
the desired signal which we want to obtained and find the best
filter for noise reduction. The feature extraction is done on the
obtained filtered signal and using the concepts of cross-correlation
similarities are found between the filtered signal and database
signal.The basic principle of Speech Recognition System (SPS) is to
recognize the spoken words irrespective of the speakers and to
convert them to text. The growth in wireless communication and
mobile devices has given us a large access to a pool of different
information resources and service. In this case Speech Recognition
is a key component. Present mobile devices are having limited
memory and processing capacities which are adding several
challenges to SPS. As a result recognition systems executing on
mobile devices only supports low complexity recognition tasks such
as simple name dialing. In this paper we have shown a simple and
less complex algorithm for recognizing the spoken words. But is this
Page 5 of 31
paper we use signal features of the speech sample. The proposed
algorithm uses the principle of Correlation to recognize the spoken
word perfectly. In communication and signal processing,cross-
correlation is a measure known as a sliding dot product or sliding
inner-product. It is commonly used for searching a long-signal for a
shorter, known feature. It also has applications in pattern recognition,
single particle analysis,electron tomographic averaging, cryptanalysis,
and neurophysiology. For continuous functions, ‘f‘and ‘g’, the cross
correlation is defined as
(f∗g)(t)=∫-∞ to ∞ [f∗τg(t+τ)dτ ]
So keeping this principle in our mind we implemented our algorithm
of pattern recognition to recognize the spoken words efficiently
but in a simple manner.The basic block model of the proposed
algorithm is given below.
Input Speech
Denoising Feature extraction
Signal
Using FILTER
Output as
TEXT
Page 6 of 31
OVERVIEW OF FILTERs
The IIR filter and FIR filter are used for noise reduction and result has
been proposed by research. Following filter are:
FIR Filter
Finite impulse response in which impulse response h(t) does
become exactly zero at time t>T for some finite T, thus being of
finite duration. FIR filter can have linear phase characteristics. FIR
filter can be discrete-time or continuous time and digital or analog.
FIR requires no feedback and has many advantages over IIR filter.
IIR Filter
Infinite impulse response (IIR) is a property applying to much
linear-time-invariant system.Common example of linear-time-
invariant system is most electronic and digital filter. The IIR filter
having an impulse response which does not become exactly zero past
a certain point, but continuous indefinitely. IIR filter are digital filter
with infinite impulse response .for this reason IIR filter have much
better frequency response but their phase characteristics is not linear
which cause a problem to the system which need phase linearity.
Page 7 of 31
METHODLOGY
1. Speech Recognition:
The nature of a speech recognition systems are evaluated by
two elements: its accuracy (mistake rate in changing over talked
words to advanced information) and speed (how well the product can
stay aware of a human speaker). Speech recognition technology has
unlimited applications. Generally, such programming is utilized for
programmed interpretations, correspondence, sans hands figuring,
restorative translation, mechanical autonomy, mechanized client
administration, and a great deal.On the off chance that you have ever
paid a bill via telephone using an automated system, you have likely
profit by speech recognition software.Voice signal representing an
utterance of the word "seven” There are different modes available for
Speech Recognition System:
Page 8 of 31
words or sentences that will be analyzed and that results will be
stored.
v) Vocabulary Size:
Larger the vocabulary the system can make more errors. So
vocabulary size matters. The Speech Recognition Process can be
divided into many different components which is shown in the below
diagram: Speech Recognized Sentences. Speech Recognition Process
Feature Extraction Probability Estimation Decoding Language.
Page 9 of 31
returns the cross-correlation of two discrete-time sequences, x and y.
Cross-correlation measures the closeness amongst x and moved
(slacked) duplicates of y as a component of the slack. In the event that
x and y have diverse lengths, the capacity annexes zeros toward the
finish of the shorter vector so it has a similar length, N, as the other.
2.Correlation Technique:
It is the representation of the short-term power spectrum of a
sound, based on the linear transform of a log power spectrum. They
are derived from a type of cepstral representation of the audio clip,
this concept use the .wav format in MATLAB. MFCCs are commonly
used as benefit in speech recognition systems which can automatically
recognize the spoken words from the audio file. MFCCs are found in
use of audio information retrieval applications such as genre
classification, audio similarity measures. Its values are not very robust
in the presence of additive noise, so it is easy to normalize their
values in speech recognition systems to reduce the influence of noise.
2: z1=xcorr(x.y1)
m1=max(z1)
l1=length(z1)
t1= -((l1-1)/2):1((l1-1)/2);
3: plot(t1,z1)
Page 10 of 31
5: Consider a=[m1 m2 m3 m4 m5 m6] where m6=300
6: Compute m=max(a)
7: If m<=m1
read 1st file
elseif m<=m2
read 2nd file
elseif m<=m3
read 3rd file
elseif m<=m4
read 4th file
elseif m<=m5
read 5th file
else
read denied file
8: End
function speechrecognition(filename)
%Speech Recognition Using Correlation Method
%Write Following Command On Command Window
%speechrecognition('test.wav')
voice=wavread(filename);
x=voice;
x=x';
x=x(1,:);
x=x';
y1=wavread('one.wav');
y1=y1';
y1=y1(1,:);
y1=y1';
Page 11 of 31
z1=xcorr(x,y1);
m1=max(z1);
l1=length(z1);
t1=-((l1-1)/2):1:((l1-1)/2);
t1=t1';
%subplot(3,2,1);
plot(t1,z1);
y2=wavread('two.wav');
y2=y2';
y2=y2(1,:);
y2=y2';
z2=xcorr(x,y2);
m2=max(z2);
l2=length(z2);
t2=-((l2-1)/2):1:((l2-1)/2);
t2=t2';
%subplot(3,2,2);
figure
plot(t2,z2);
y3=wavread('three.wav');
y3=y3';
y3=y3(1,:);
y3=y3';
z3=xcorr(x,y3);
m3=max(z3);
l3=length(z3);
t3=-((l3-1)/2):1:((l3-1)/2);
t3=t3';
%subplot(3,2,3);
figure
plot(t3,z3);
y4=wavread('four.wav');
y4=y4';
y4=y4(1,:);
y4=y4';
z4=xcorr(x,y4);
m4=max(z4);
Page 12 of 31
l4=length(z4);
t4=-((l4-1)/2):1:((l4-1)/2);
t4=t4';
%subplot(3,2,4);
figure
plot(t4,z4);
y5=wavread('five.wav');
y5=y5';
y5=y5(1,:);
y5=y5';
z5=xcorr(x,y5);
m5=max(z5);
l5=length(z5);
t5=-((l5-1)/2):1:((l5-1)/2);
t5=t5';
%subplot(3,2,5);
figure
plot(t5,z5);
m6=300;
a=[m1 m2 m3 m4 m5 m6];
m=max(a);
h=wavread('allow.wav');
if m<=m1
soundsc(wavread('one.wav'),50000)
soundsc(h,50000)
elseif m<=m2
soundsc(wavread('two.wav'),50000)
soundsc(h,50000)
elseif m<=m3
soundsc(wavread('three.wav'),50000)
soundsc(h,50000)
elseif m<=m4
soundsc(wavread('four.wav'),50000)
soundsc(h,50000)
elseif m<m5
soundsc(wavread('five.wav'),50000)
soundsc(h,50000)
Page 13 of 31
else
{soundsc(wavread('denied.wav'),50000)}
end
PLOT
Page 15 of 31
MATLAB CODE FOR CHEBYSHEV FILTER
clc;
clear all;
close all;
myvoice=audiorecorder;
disp('Start speaking');
recordblocking(myvoice,10);
disp('Stop speaking');
x=getaudiodata(myvoice);
plot(x);
fs=8000;
t=(0:length(x)-1)/fs;
subplot(2,1,1)
plot(t,x);
F = fs/2;
wp= 400/F;
ws = 2000/F;
[b,a] = cheby2(10,20,[wp,ws],'bandpass');
%[b,a] = butter(6,[wp,ws],'bandpass');
filteredSignal = filter(b, a, x);
player = audioplayer(filteredSignal, fs);
play(player);
t=(0:length(filteredSignal)-1)/fs;
subplot(2,1,2)
plot(t,filteredSignal) ;
figure(2)
freqz(b,a)
Page 16 of 31
PLOTS
Page 17 of 31
MATLAB CODE FOR FIR FILTER
myvoice=audiorecorder;
disp('Start speaking');
recordblocking(myvoice,5);
disp('Stop speaking');
x=getaudiodata(myvoice);
y=fft(x);
t=1:1280;
f=input('Enter sampling freq in Hz:');
fp1=input('Enter passband freq1 in Hz:');
fs1=input('Enter stopband freq1 in Hz:');
fp2=input('Enter passband freq2 in Hz:');
fs2=input('Enter stopband freq2 in Hz:');
wp1=2*(fp1/f);
ws1=2*(fs1/f);
wp2=2*(fp2/f);
ws2=2*(fs2/f);
%del_p=(10^(Ap/20)-1)/(10^(Ap/20)+1);
%10001del_s=10^(-As/20);
%delta=abs((ws-wp)/2*pi);
del_w=min((wp1-ws1),(ws2-wp2));
wc1=wp1-(del_w/2);
wc2=wp2+(del_w/2);
c=1.8*pi;
%kaiser technique for approx. order
%n=(-20*log((del_p*del_s)^0.5)-13)/(14.6*delta);
n1=1.8*pi/del_w;
n=ceil(n1);
window=boxcar(n+1); %rectangular window
%{
n1=6.1*pi/del_w;
n=ceil(n1);
window=triang(n+1); %triangular window
Page 18 of 31
%}
%{
n1=6.2*pi/del_w;
n=ceil(n1);
window=hanning(n+1); % Hanning window
%}
%{
n1=6.6*pi/del_w;
n=ceil(n1);
window=hamming(n+1); %Hamming window
%}
%{
n1=11*pi/del_w;
n=ceil(n1);
window=blackman(n+1); %blackman window
%}
b=fir1(n,[wc1,wc2],window);
[h,w]=freqz(b,1);
z = filter(b,1,x);
g=fft(z);
sound(x);
pause(6);
sound(z);
subplot(3,2,1),plot(x);
title('Noisy Signal');
xlabel('Time (s)');
ylabel('Amplitude');
subplot(3,2,2),plot(z);
title("filtered signal")
xlabel ('t, sec'), ylabel ('Sample');
subplot(3,2,3),plot(w/pi,20*log(abs(h)));
title('Magnitude response');
xlabel('nf');
ylabel('magnitude');
subplot(3,2,4),plot(w/pi,angle(h));
title('Phase response');
xlabel('nf');
Page 19 of 31
ylabel('Angle');
xlabel ('t, sec'),ylabel('Sample');
subplot(3,2,5),plot(t,y(1:1280));
title("FFT of fltered signal signal");
xlabel ('t, sec'), ylabel ('Sample');
subplot(3,2,6),plot(t,g(1:1280));
title("FFT of noise signal");
PLOT
Page 20 of 31
For triangular window
Page 21 of 31
For Hamming window
Page 22 of 31
COMBIND OF BUTTERWORTH
CHEBYSHEV ELLIPTICAL
clc;
clear all;
close all;
fsamp=input('Enter sampling freq in Hz:');
fp=input('Enter passband freq in Hz:');
fs=input('Enter stopband freq in Hz:');
Ap=1;
As=20;
wp=2*(fp/fsamp);
ws=2*(fs/fsamp);
n1=0.5*(log((10^(0.1*Ap)-1)/(10^(0.1*As)-1))/log(wp/ws));
n=ceil(n1);
myvoice=audiorecorder;
disp('start speaking');
recordblocking(myvoice,10);
disp('stop speaking');
x=getaudiodata(myvoice);
plot(x)
%x=audioread('E:\car6.wav');
[n,wn]=buttord(wp,ws,Ap,As);
%[b,a]=butter(n,wn,'low');
[b a] = butter(2,[0.6 0.7],'bandpass');
[h,w]=freqz(b,a);
z=filter(b,a,x);
sound(z);
subplot(2,2,1),plot(x);
title('Noisy Signal');
xlabel('Time (s)');
ylabel('Amplitude');
subplot(2,2,2),plot(z);
Page 23 of 31
title("filtered signal")
xlabel ('t, sec'), ylabel ('Sample');
subplot(2,2,3),plot(w/pi,20*log(abs(h)));
title('Magnitude response');
xlabel('nf');
ylabel('magnitude');
subplot(2,2,4),plot(w/pi,angle(h));
title('Phase response');
xlabel('nf');
ylabel('Angle');
%
[b_buttera_butter] = butter(4, 0.9, 'low');
H_butter = freqz(b_butter, a_butter);
%
figure(3)
norm_freq_axis = [0:1/(512 -1):1];
plot(norm_freq_axis, abs(H_butter))
hold on
plot(norm_freq_axis, abs(H_cheby),'r')
plot(norm_freq_axis, abs(H_ellip),'g')
legend('Butterworth', 'Chebyshev', 'Elliptical')
xlabel('Normalised Frequency');
ylabel('Magnitude')
%
figure(4);
plot(norm_freq_axis, 20*log10(abs(H_butter)))
hold on
plot(norm_freq_axis, 20*log10(abs(H_cheby)),'r')
plot(norm_freq_axis, 20*log10(abs(H_ellip)),'g')
legend('Butterworth filter ', 'Chebyshev filter', 'Elliptical filter');
Page 24 of 31
xlabel('Normalised Frequency (along x)');
ylabel('Magnitude (in dB)')
PLOTS
Page 25 of 31
Page 26 of 31
RESULTS
Using MATLAB, the graphs for comparison between the test and
sample audio files is derived. Two test files and five sample files
which has audio (Spoken word) of one to five is considered. One test
file is match from five sample files and another test file is the denied
file which is not matched with any sample files. When a test file is
given as the input, the loop starts where first the spoken word from
the audio files are computed and correlated with each other and using
MALAB the graph where frequency of speech is displayed. Let‘s
consider the test.wav file which is match for the second sample. When
the input speechrecognition(‗test.wav‘) is given in MATLAB, the
comparison will start. The below are the graphs:
Page 27 of 31
Now consider the denied.wav file which is not a match with any
sample given. When the given input speechrecognition(‗denied.wav‘)
in MATLAB command prompt, the comparison will start and it will
tell denied which means the file is not matched with any of the sample
files. The below are the graphs:
Page 28 of 31
When we see in the success result, second sample is the successful
match so at coordinates (0, 0) the words of audio file match which
is seen in frequency format in graph.
Page 29 of 31
CONCLUSION__________________
This paper defines us successfully about various features,
behavior and characteristics of speech signals and also deals
with the concept of cross correlation. In this paper, an
algorithm has been created with the help of MATLAB
programming which requires .wav format speech input
signals where comparison with the test sound file using
correlation technique takes place. Thus, paper concludes
that in order to remove the further limitation of audio
formats there is a requirement for the study of various
formats of speech signals which will be further used for
communication with the machines which include the
hardware part and not the simulator.
Page 30 of 31
REFERENCES
Page 31 of 31