MathLab Based Speech Processing
MathLab Based Speech Processing
Related works
In the year 1995 speech recognition using Neutral Networks was proposed by joe
Tebelskis where he had examined how artificial neutral networks can benefit a large
vocabulary, speaker independent, continuous speech recognition system. He explored
two different ways to use neural networks for acoustic modeling prediction and
classification. He found that predictive networks yield poor results because of
discrimination, but classification networks gave excellent results. He also verified
that, in accordance with theory, the output activation of a classification network form
highly accurate estimates of the posterior probabilities P , and he showed how these
can easily be converted to likelihood P for standard HMM recognition algorithms
In the year 2003 chulhee Lee, Donghion Hyun, Euisun Chol, jinwook Go, and
Chungyong Lee in their paper optimizing feature extraction for speech recognition
had proposed a method to minimize the lose of information during the feature
1
extraction stage in speech recognition by optimizing the parameters of the
melcepstrum transformation , a transform which is widely used in speech recognition.
The melcepstrum was obtained by critical band filters whose characteristics play an
important role in converting a speech signal into a sequence of vectors. First , they
analyze the performance of the melcepstrum by changing the parameters of the filters
such as shape , center frequency, and band width.Then they proposed an algorithm to
optimize the parameters of the filters using the simplex method.
Speech recognition is a technique that converts pulse code modulation digital audio
from a sound card into recognized speech.It is wavy line which just looks like the
output of an oscilloscope. Wile transforming the PCM digital audio into frequency
domain , it mainly identifies the frequency component of a sound. The main objective
of the speech recognition system is to recognize the speech what user have told .
Therefore it must understand the phoneme of the spoken word. But unfortunately it
becomes diuretic for the following reasons. Every time the word spoken by the user
sounds different. Users may not generate exactly the same sound for the same
2
phoneme. Also the background nose from the microphone and users room sometime
cause the recognizer to hear the different sound then it would have if the user was in a
quite room with the high quality microphone. Various methods used for speech
recognition are fast Fourier transform, training using neural network, various
statistical techniques etc. But hear we have suggested a different approach for speech
recognition which is based on image processing .
A digital image is composed of a grid of pixels and stored as an array. A single pixel
represents a value of either light intensity or color . Images are processed to obtain
information what is visible beyond the given the image initial pixel values
A binary image basically consists of two values that is either 0 or 1. This type of
image is commonly used as a multiplayer to mask regions within another image.
A gray scale digital image is an Image in which the value of each pixel is having a
single component that is only intensity information.This type of image are also known
as black and white, are composed exclusively of shades of grey, varying from black
from the least intensity to white at the most intensity. Grey scale images are distinct
from one- bit bi- tonal black and white images, which are having two colors, only that
is black and white. Grey scale images have many shades of grey in between. Grey
scale images are also called monochromatic ,denoting the presence of only one color
3
4.3 RGB Image
An RGB image is having 3d out of which 2 of the dimensions specify the location of
a pixel within an image. The other dimension specify the color of each pixel. The
color dimension consist of 3 components which is composed of the red, green and
blue color bands. In the RGB color model, a color image can be represented by the
intensity function.1RGB=(FR,FG,FB) , where FR(x,y) is the intensity of pixel (x,y) i
the green channel , and FB (x,y) is the intensity of the pixel (x,y) in the blue channel.
The luminance of grey scale image is matched with luminance color image during
RCB to grey scale conversion. One method is to obtain the values of red, green, and
blue primaries in linear intensity encoding by using gamma expansion. Then 30% of
the red value, 59% of green value, and11% of the blue value are added together
4.4 Histogram
The Correlation coefficient computed from the sample data measures the strength and
direction of a relationship between two variables. The Correlation coefficient is a
number between 0 and 1. If there is no relationship between the predicted values and
4
the actual values the Correlation coefficient is 0 or very low.As the strength of the
relationship between the predicted values and actual values increases so does
Correlation coefficient. A perfect fit gives a coefficient of 1.0.Thus the higher the
Correlation coefficient the better[ 9,11]. Corr 2 computes the Correlation coefficient
using
MATLAB PROGRAM
5
%input-3
y3=audioread('hey_there.wav');
y3=y3';
y3=y3(1,:);
y3=y3';
z3=xcorr(x,y3);
m3=max(z3);
l3=length(z3);
t3=-((l3-1)/2):1:((l3-1)/2);
t3=t3';
%input-4
y4=audioread('hello.wav');
y4=y4';
y4=y4(1,:);
y4=y4';
z4=xcorr(x,y4);
m4=max(z4);
l4=length(z4);
t4=-((l4-1)/2):1:((l4-1)/2);
t4=t4';
zmax=max([max(z1),max(z2),max(z3),max(z4)]);
zmin=min([min(z1),min(z2),min(z3),min(z4)]);
%test 1
subplot(2,2,1);plot(t1,z1);grid;
title('OK GOOGLE');
axis([min(t1) max(t1) zmin zmax]);
%test 2
subplot(2,2,2);plot(t2,z2);grid;
title('WHATs UP');
axis([min(t2) max(t2) zmin zmax]);
%test 3
subplot(2,2,3);plot(t3,z3);grid;
title('HEY THERE');
axis([min(t3) max(t3) zmin zmax]);
%test 4
subplot(2,2,4);plot(t4,z4);grid;
title('HELLO');
axis([min(t4) max(t4) zmin zmax]);
6
RESULT
7
References
https://www.google.co.in/
https://en.wikipedia.org/
https://www.ieee.org/
https://in.mathworks.com/