Audio Signal Processing Audio Signal Processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

AUDIO SIGNAL PROCESSING

FINAL REPORT SUBMITTED FOR


DIGITAL SIGNAL PROCESSING

BY:-
1. VIPLAW KUMAR (BE/10077/17)
2. AHMAD FAROOQUE (BE/10086/17)
3. FARAZ AHMAD KHAN (BE/10110/17)

Submitted to:-

Mr. G.S Gupta


Assistant Professor

DEPARTMENT OF ELECTRICAL AN
AND
D ELECTRONICS
ENGINEERING

BIRLA INSTITUTE OF TECHNOLOGY


MESRA, RANCHI – 835215 (INDIA)

Page 1 of 31
CERTIFICATE

This is to certify that the contents of the report entitled “AUDIO


SIGNAL PROCESSING”is a bonafide work carried out by these
students.The contents of the report have not been submitted earlier for
the award of any otherdegree or certificates and we hereby commend
the work done by them in this
connection.

Mr. G.S. Gupta


Assistant Professor,
Dept. of Electrical & Electronics Engineering,
Birla Institute of Technology,Mesra, Ranchi.

Page 2 of 31
ACKNOWLEDGEMENT

We are very grateful to Mr. G.S GUPTA, Assistant Professor, Dept.


of Electrical & Electronics Engineering, Birla Institute of
Technology for providing us with a platform to learn and apply
concepts to build something new through this project.Also very
thankful that this project provide us understanding about sound
processing in real world.

Page 3 of 31
ABSTRACT

Delivering message by voice is the most important, effective and


common method of exchange information for mankind. Language
is human specific features and human voice is commonly used tool
which is also the important way to pass information to each other. The
voice has large information capacity. So we can use modern method
to study voice processing technology, so that people can easily
transmit, store, access and apply the voice. In this thesis, we
designed a collection system that can collect voice and use different
filters to filter the noise and ten convert this filtered voice to text.
After filtering the noise, the voice will be more quality in mobile
communication, radio, TV and so on. In this thesis we use computer
microphone recorder to collect a voice, and then analyze its time-
domain, the frequency spectrum and the characteristics of the voice
signal. We used MATLAB functions to remove the noise which has
been added to the voice. After that we compare the time-domain and
frequency-domain of the original voice and noised voice, then
playback the noised voice and de-noising voice and then compare the
application of signal processing in FIR filter and IIR filter, especially
in the perspectives of the signal filtering de-noising characteristics
and applications. According to the comparison, we can determine
which filter is the best.
In signal processing, the function of a filter is to remove unwanted
parts of the signal. This paper presents the comparisons of Digital FIR
& IIR filter complexity and their performances to remove Baseline
noises from the speech signal hence converting this filtered speech
signal to text using the concepts of cross-correlation.

Page 4 of 31
INTRODUCTION
In most devices (like Smartphone and computer) speech signal is
captured by the microphones of a speech communication device is
often distorted by interfering noise source as well as room
surrounding reverberation. Such degradation may reduce the
listening comfort and speech intelligibility and further processing
quality. This degradation can be reducing by using filtering method
IIR, FIR. The target of a future speech enhancement algorithm
should be a reduction of unwanted background noise and room
reverberation. Each method has its own advantage and efficiency
depend on the type of noise. Unwanted noise consume energy and
deteriorates the audibility of the signal there are a variety of method
to reduce noise both analog and digital. Each method has its own
advantages and efficiency is often dependent upon the type of noise.
The noise suppression has application is virtually all field of
communication (channel equalization, radar signal processing etc.)
and other field. In this paper our experiments on filtering technology;
IIR filter, FIR filter. The output signal from the filter is compare with
the desired signal which we want to obtained and find the best
filter for noise reduction. The feature extraction is done on the
obtained filtered signal and using the concepts of cross-correlation
similarities are found between the filtered signal and database
signal.The basic principle of Speech Recognition System (SPS) is to
recognize the spoken words irrespective of the speakers and to
convert them to text. The growth in wireless communication and
mobile devices has given us a large access to a pool of different
information resources and service. In this case Speech Recognition
is a key component. Present mobile devices are having limited
memory and processing capacities which are adding several
challenges to SPS. As a result recognition systems executing on
mobile devices only supports low complexity recognition tasks such
as simple name dialing. In this paper we have shown a simple and
less complex algorithm for recognizing the spoken words. But is this

Page 5 of 31
paper we use signal features of the speech sample. The proposed
algorithm uses the principle of Correlation to recognize the spoken
word perfectly. In communication and signal processing,cross-
correlation is a measure known as a sliding dot product or sliding
inner-product. It is commonly used for searching a long-signal for a
shorter, known feature. It also has applications in pattern recognition,
single particle analysis,electron tomographic averaging, cryptanalysis,
and neurophysiology. For continuous functions, ‘f‘and ‘g’, the cross
correlation is defined as
 (f∗g)(t)=∫-∞ to ∞ [f∗τg(t+τ)dτ ]
So keeping this principle in our mind we implemented our algorithm
of pattern recognition to recognize the spoken words efficiently
but in a simple manner.The basic block model of the proposed
algorithm is given below.

Input Speech
Denoising Feature extraction
Signal
Using FILTER

Database Comparison using Correlation

Output as
TEXT

Page 6 of 31
OVERVIEW OF FILTERs
The IIR filter and FIR filter are used for noise reduction and result has
been proposed by research. Following filter are:

 FIR Filter
Finite impulse response in which impulse response h(t) does
become exactly zero at time t>T for some finite T, thus being of
finite duration. FIR filter can have linear phase characteristics. FIR
filter can be discrete-time or continuous time and digital or analog.
FIR requires no feedback and has many advantages over IIR filter.

 IIR Filter
Infinite impulse response (IIR) is a property applying to much
linear-time-invariant system.Common example of linear-time-
invariant system is most electronic and digital filter. The IIR filter
having an impulse response which does not become exactly zero past
a certain point, but continuous indefinitely. IIR filter are digital filter
with infinite impulse response .for this reason IIR filter have much
better frequency response but their phase characteristics is not linear
which cause a problem to the system which need phase linearity.

Page 7 of 31
METHODLOGY

Speech recognition is widely and mostly used in almost every security


project where machine can recognize the person’s voice as password
to unlock it. For example, in user’s daily life, if user want to turn the
Geyser on or off using the voice commands then Speech Recognition
plays a vital role. Application should understand the system and
recognize the user commands ON or OFF. Depending on limitations
of other model, the technique called cross correlation for recognition
of speech is used and simulated in MATLAB. Correlation compares
the two signals, considering the five samples and comparing them
with the test sample gives us the result. Every sound sample (test or
five samples) are in .wav format. To recognize the words from the
sound the concept Mel frequency cepstral coefficients (MFCCs) is
used. There are three main concept to understand before knowing
further which are:

1. Speech Recognition:
The nature of a speech recognition systems are evaluated by
two elements: its accuracy (mistake rate in changing over talked
words to advanced information) and speed (how well the product can
stay aware of a human speaker). Speech recognition technology has
unlimited applications. Generally, such programming is utilized for
programmed interpretations, correspondence, sans hands figuring,
restorative translation, mechanical autonomy, mechanized client
administration, and a great deal.On the off chance that you have ever
paid a bill via telephone using an automated system, you have likely
profit by speech recognition software.Voice signal representing an
utterance of the word "seven” There are different modes available for
Speech Recognition System:

i) Speaker Dependent / Independent System:


It must be trained in order to recognize accurately what has
been said. To train a system, Speaker is asked to record predefined

Page 8 of 31
words or sentences that will be analyzed and that results will be
stored.

ii) Isolated Word Recognition:


It is Simplest mode and less greedy in terms of CPU
requirement. Word is surrounded by silence so that boundaries are
well known.

iii) Continuous Speech Recognition:


It assumes that system is able to recognize a sequence of words
in a sentence.

iv) Keyword Spotting:


It is able to identify in a sentence a word corresponding to a
particular command. Created to cover the gap between isolated and
continuous System.

v) Vocabulary Size:
Larger the vocabulary the system can make more errors. So
vocabulary size matters. The Speech Recognition Process can be
divided into many different components which is shown in the below
diagram: Speech Recognized Sentences. Speech Recognition Process
Feature Extraction Probability Estimation Decoding Language.

Correlation Technique: Cross-correlation is a measure of


similarity of two series as a function of the displacement of one
relative to the other. This is also known as a sliding dot product or
sliding inner-product. It is commonly used for searching a long signal
for a shorter, known feature. It has applications in pattern recognition,
single particle analysis. The term cross-correlation is utilized for
alluding to the relationships between the sections of two arbitrary
vectors X and Y, while the connections of an irregular vector X are
thought to be simply the connections between simply the passages of
X, those shaping the connection lattice (network of connections) of X.
xcorr function of MATLAB is an Cross-correlation function for
sequence for a random process which includes autocorrelation. Syntax
for Correlation in MATLAB is derived as r = xcorr(x,y) r = xcorr(x,y)

Page 9 of 31
returns the cross-correlation of two discrete-time sequences, x and y.
Cross-correlation measures the closeness amongst x and moved
(slacked) duplicates of y as a component of the slack. In the event that
x and y have diverse lengths, the capacity annexes zeros toward the
finish of the shorter vector so it has a similar length, N, as the other.

2.Correlation Technique:
It is the representation of the short-term power spectrum of a
sound, based on the linear transform of a log power spectrum. They
are derived from a type of cepstral representation of the audio clip,
this concept use the .wav format in MATLAB. MFCCs are commonly
used as benefit in speech recognition systems which can automatically
recognize the spoken words from the audio file. MFCCs are found in
use of audio information retrieval applications such as genre
classification, audio similarity measures. Its values are not very robust
in the presence of additive noise, so it is easy to normalize their
values in speech recognition systems to reduce the influence of noise.

3. Algorithm for Speech Recognition using Correlation

Algorithm: function speechrecognition(filename)


Input: Upload 5 sample Files m1, m2, m3, m4, m5 and the test
file.
Output: Correlation result of m and test file.

1: Consider sample as voice where x=voice


Read and compute x and store in y1

2: z1=xcorr(x.y1)
m1=max(z1)
l1=length(z1)
t1= -((l1-1)/2):1((l1-1)/2);

3: plot(t1,z1)

4: Repeat steps 1,2,3 for all 5 samples.

Page 10 of 31
5: Consider a=[m1 m2 m3 m4 m5 m6] where m6=300

6: Compute m=max(a)

7: If m<=m1
read 1st file
elseif m<=m2
read 2nd file
elseif m<=m3
read 3rd file
elseif m<=m4
read 4th file
elseif m<=m5
read 5th file
else
read denied file

8: End

CODE FOR SPEECH RECOGNITION

function speechrecognition(filename)
%Speech Recognition Using Correlation Method
%Write Following Command On Command Window
%speechrecognition('test.wav')
voice=wavread(filename);
x=voice;
x=x';
x=x(1,:);
x=x';
y1=wavread('one.wav');
y1=y1';
y1=y1(1,:);
y1=y1';

Page 11 of 31
z1=xcorr(x,y1);
m1=max(z1);
l1=length(z1);
t1=-((l1-1)/2):1:((l1-1)/2);
t1=t1';
%subplot(3,2,1);
plot(t1,z1);
y2=wavread('two.wav');
y2=y2';
y2=y2(1,:);
y2=y2';
z2=xcorr(x,y2);
m2=max(z2);
l2=length(z2);
t2=-((l2-1)/2):1:((l2-1)/2);
t2=t2';
%subplot(3,2,2);
figure
plot(t2,z2);
y3=wavread('three.wav');
y3=y3';
y3=y3(1,:);
y3=y3';
z3=xcorr(x,y3);
m3=max(z3);
l3=length(z3);
t3=-((l3-1)/2):1:((l3-1)/2);
t3=t3';
%subplot(3,2,3);
figure
plot(t3,z3);
y4=wavread('four.wav');
y4=y4';
y4=y4(1,:);
y4=y4';
z4=xcorr(x,y4);
m4=max(z4);

Page 12 of 31
l4=length(z4);
t4=-((l4-1)/2):1:((l4-1)/2);
t4=t4';
%subplot(3,2,4);
figure
plot(t4,z4);
y5=wavread('five.wav');
y5=y5';
y5=y5(1,:);
y5=y5';
z5=xcorr(x,y5);
m5=max(z5);
l5=length(z5);
t5=-((l5-1)/2):1:((l5-1)/2);
t5=t5';
%subplot(3,2,5);
figure
plot(t5,z5);
m6=300;
a=[m1 m2 m3 m4 m5 m6];
m=max(a);
h=wavread('allow.wav');
if m<=m1
soundsc(wavread('one.wav'),50000)
soundsc(h,50000)
elseif m<=m2
soundsc(wavread('two.wav'),50000)
soundsc(h,50000)
elseif m<=m3
soundsc(wavread('three.wav'),50000)
soundsc(h,50000)
elseif m<=m4
soundsc(wavread('four.wav'),50000)
soundsc(h,50000)
elseif m<m5
soundsc(wavread('five.wav'),50000)
soundsc(h,50000)

Page 13 of 31
else
{soundsc(wavread('denied.wav'),50000)}

end

MATLAB CODE FOR BUTTERWORTH FILTER


clc;
clear all;
close all;
fsamp=input('Enter sampling freq in Hz:');
fp=input('Enter passband freq in Hz:');
fs=input('Enter stopband freq in Hz:');
Ap=input('Enter passband attenuation in dB:');
As=input('Enter stopband attenuation in dB:');
wp=2*(fp/fsamp);
ws=2*(fs/fsamp);
n1=0.5*(log((10^(0.1*Ap)-1)/(10^(0.1*As)-1))/log(wp/ws));
n=ceil(n1);
x=audioread('C:\Users\HP\Downloads\car6.wav');
[n,wn]=buttord(wp,ws,Ap,As);
[b,a]=butter(n,wn,'low');
[h,w]=freqz(b,a);
z=filter(b,a,x);
sound(z);
subplot(2,2,1),plot(x);
title('Noisy Signal');
xlabel('Time (s)');
ylabel('Amplitude');
subplot(2,2,2),plot(z);
title("filtered signal")
xlabel ('t, sec'), ylabel ('Sample');
subplot(2,2,3),plot(w/pi,20*log(abs(h)));
title('Magnitude response');
Page 14 of 31
xlabel('nf');
ylabel('magnitude');
subplot(2,2,4),plot(w/pi,angle(h));
title('Phase response');
xlabel('nf');
ylabel('Angle');

PLOT

Page 15 of 31
MATLAB CODE FOR CHEBYSHEV FILTER

clc;
clear all;
close all;
myvoice=audiorecorder;
disp('Start speaking');
recordblocking(myvoice,10);
disp('Stop speaking');
x=getaudiodata(myvoice);
plot(x);
fs=8000;
t=(0:length(x)-1)/fs;
subplot(2,1,1)
plot(t,x);
F = fs/2;
wp= 400/F;
ws = 2000/F;
[b,a] = cheby2(10,20,[wp,ws],'bandpass');
%[b,a] = butter(6,[wp,ws],'bandpass');
filteredSignal = filter(b, a, x);
player = audioplayer(filteredSignal, fs);
play(player);
t=(0:length(filteredSignal)-1)/fs;
subplot(2,1,2)
plot(t,filteredSignal) ;
figure(2)
freqz(b,a)

Page 16 of 31
PLOTS

Page 17 of 31
MATLAB CODE FOR FIR FILTER

myvoice=audiorecorder;
disp('Start speaking');
recordblocking(myvoice,5);
disp('Stop speaking');
x=getaudiodata(myvoice);
y=fft(x);
t=1:1280;
f=input('Enter sampling freq in Hz:');
fp1=input('Enter passband freq1 in Hz:');
fs1=input('Enter stopband freq1 in Hz:');
fp2=input('Enter passband freq2 in Hz:');
fs2=input('Enter stopband freq2 in Hz:');
wp1=2*(fp1/f);
ws1=2*(fs1/f);
wp2=2*(fp2/f);
ws2=2*(fs2/f);
%del_p=(10^(Ap/20)-1)/(10^(Ap/20)+1);
%10001del_s=10^(-As/20);
%delta=abs((ws-wp)/2*pi);
del_w=min((wp1-ws1),(ws2-wp2));
wc1=wp1-(del_w/2);
wc2=wp2+(del_w/2);
c=1.8*pi;
%kaiser technique for approx. order
%n=(-20*log((del_p*del_s)^0.5)-13)/(14.6*delta);

n1=1.8*pi/del_w;
n=ceil(n1);
window=boxcar(n+1); %rectangular window

%{
n1=6.1*pi/del_w;
n=ceil(n1);
window=triang(n+1); %triangular window

Page 18 of 31
%}
%{
n1=6.2*pi/del_w;
n=ceil(n1);
window=hanning(n+1); % Hanning window
%}
%{
n1=6.6*pi/del_w;
n=ceil(n1);
window=hamming(n+1); %Hamming window
%}
%{
n1=11*pi/del_w;
n=ceil(n1);
window=blackman(n+1); %blackman window
%}
b=fir1(n,[wc1,wc2],window);
[h,w]=freqz(b,1);
z = filter(b,1,x);
g=fft(z);
sound(x);
pause(6);
sound(z);
subplot(3,2,1),plot(x);
title('Noisy Signal');
xlabel('Time (s)');
ylabel('Amplitude');
subplot(3,2,2),plot(z);
title("filtered signal")
xlabel ('t, sec'), ylabel ('Sample');
subplot(3,2,3),plot(w/pi,20*log(abs(h)));
title('Magnitude response');
xlabel('nf');
ylabel('magnitude');
subplot(3,2,4),plot(w/pi,angle(h));
title('Phase response');
xlabel('nf');

Page 19 of 31
ylabel('Angle');
xlabel ('t, sec'),ylabel('Sample');
subplot(3,2,5),plot(t,y(1:1280));
title("FFT of fltered signal signal");
xlabel ('t, sec'), ylabel ('Sample');
subplot(3,2,6),plot(t,g(1:1280));
title("FFT of noise signal");

PLOT

For rectangular window

Page 20 of 31
For triangular window

For Hanning window

Page 21 of 31
For Hamming window

For blackman window

Page 22 of 31
COMBIND OF BUTTERWORTH
CHEBYSHEV ELLIPTICAL
clc;
clear all;
close all;
fsamp=input('Enter sampling freq in Hz:');
fp=input('Enter passband freq in Hz:');
fs=input('Enter stopband freq in Hz:');
Ap=1;
As=20;
wp=2*(fp/fsamp);
ws=2*(fs/fsamp);
n1=0.5*(log((10^(0.1*Ap)-1)/(10^(0.1*As)-1))/log(wp/ws));
n=ceil(n1);
myvoice=audiorecorder;
disp('start speaking');
recordblocking(myvoice,10);
disp('stop speaking');
x=getaudiodata(myvoice);
plot(x)
%x=audioread('E:\car6.wav');
[n,wn]=buttord(wp,ws,Ap,As);
%[b,a]=butter(n,wn,'low');
[b a] = butter(2,[0.6 0.7],'bandpass');
[h,w]=freqz(b,a);
z=filter(b,a,x);
sound(z);
subplot(2,2,1),plot(x);
title('Noisy Signal');
xlabel('Time (s)');
ylabel('Amplitude');
subplot(2,2,2),plot(z);
Page 23 of 31
title("filtered signal")
xlabel ('t, sec'), ylabel ('Sample');
subplot(2,2,3),plot(w/pi,20*log(abs(h)));
title('Magnitude response');
xlabel('nf');
ylabel('magnitude');
subplot(2,2,4),plot(w/pi,angle(h));
title('Phase response');
xlabel('nf');
ylabel('Angle');
%
[b_buttera_butter] = butter(4, 0.9, 'low');
H_butter = freqz(b_butter, a_butter);

[b_chebya_cheby] = cheby1(4, 0.9, 0.8, 'low');


H_cheby = freqz(b_cheby, a_cheby);

[b_ellipa_ellip] = ellip(4, 0.9, 40, 0.8, 'low');


H_ellip = freqz(b_ellip, a_ellip);

%
figure(3)
norm_freq_axis = [0:1/(512 -1):1];
plot(norm_freq_axis, abs(H_butter))
hold on
plot(norm_freq_axis, abs(H_cheby),'r')
plot(norm_freq_axis, abs(H_ellip),'g')
legend('Butterworth', 'Chebyshev', 'Elliptical')
xlabel('Normalised Frequency');
ylabel('Magnitude')
%
figure(4);
plot(norm_freq_axis, 20*log10(abs(H_butter)))
hold on
plot(norm_freq_axis, 20*log10(abs(H_cheby)),'r')
plot(norm_freq_axis, 20*log10(abs(H_ellip)),'g')
legend('Butterworth filter ', 'Chebyshev filter', 'Elliptical filter');

Page 24 of 31
xlabel('Normalised Frequency (along x)');
ylabel('Magnitude (in dB)')

PLOTS

Page 25 of 31
Page 26 of 31
RESULTS

Using MATLAB, the graphs for comparison between the test and
sample audio files is derived. Two test files and five sample files
which has audio (Spoken word) of one to five is considered. One test
file is match from five sample files and another test file is the denied
file which is not matched with any sample files. When a test file is
given as the input, the loop starts where first the spoken word from
the audio files are computed and correlated with each other and using
MALAB the graph where frequency of speech is displayed. Let‘s
consider the test.wav file which is match for the second sample. When
the input speechrecognition(‗test.wav‘) is given in MATLAB, the
comparison will start. The below are the graphs:

Page 27 of 31
Now consider the denied.wav file which is not a match with any
sample given. When the given input speechrecognition(‗denied.wav‘)
in MATLAB command prompt, the comparison will start and it will
tell denied which means the file is not matched with any of the sample
files. The below are the graphs:

Page 28 of 31
When we see in the success result, second sample is the successful
match so at coordinates (0, 0) the words of audio file match which
is seen in frequency format in graph.

Page 29 of 31
CONCLUSION__________________
This paper defines us successfully about various features,
behavior and characteristics of speech signals and also deals
with the concept of cross correlation. In this paper, an
algorithm has been created with the help of MATLAB
programming which requires .wav format speech input
signals where comparison with the test sound file using
correlation technique takes place. Thus, paper concludes
that in order to remove the further limitation of audio
formats there is a requirement for the study of various
formats of speech signals which will be further used for
communication with the machines which include the
hardware part and not the simulator.

Page 30 of 31
REFERENCES

 S SANGEETHA AND P KANNAN: DESIGN AND ANALYSIS OF


DIGITAL FILTERS FOR SPEECH SIGNALS USING MULTIRATE
SIGNAL PROCESSING DOI: 10.21917/ijme.2018.0086
 International Journal of Engineering Research & Technology
(IJERT)ISSN: 2278-0181Vol. 3 Issue 2, February – 2014
 Digital Signal Processing Using Matlab V4 - Ingle and Proakis

Page 31 of 31

You might also like