0% found this document useful (0 votes)
71 views8 pages

Speech Recognition Using Correlation Tec

This document summarizes a research paper on speech recognition using correlation technique. The paper proposes a system with two modules: speaker identification using mel frequency cepstral coefficients (MFCC) for feature extraction and speaker recognition using hidden Markov models. It discusses limitations of related works in not describing speech signal characteristics and noise handling. The methodology section describes using cross-correlation in MATLAB to recognize words from sound samples by comparing test samples to five reference samples and extracting MFCC features.

Uploaded by

Aayan Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views8 pages

Speech Recognition Using Correlation Tec

This document summarizes a research paper on speech recognition using correlation technique. The paper proposes a system with two modules: speaker identification using mel frequency cepstral coefficients (MFCC) for feature extraction and speaker recognition using hidden Markov models. It discusses limitations of related works in not describing speech signal characteristics and noise handling. The methodology section describes using cross-correlation in MATLAB to recognize words from sound samples by comparing test samples to five reference samples and extracting MFCC features.

Uploaded by

Aayan Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal of Current Trends in Engineering & Research (IJCTER)

e-ISSN 2455–1392 Volume 3 Issue 6, June 2017 pp. 82 – 89


Scientific Journal Impact Factor : 3.468
http://www.ijcter.com

Speech Recognition Using Correlation Technique


Anjalika Gupta1, Prof. Pankaj Raibagkar2, Dr. Anup Palsokar3
1
Student SEM VI, 2Assistant Professor, 3Associate Professor,
Department of Computer Applications
1, 2, 3
SIES College of Management Studies,
Shri Chandrasekarendra Saraswathi Vidyapuram,
Plot 1-E, Sector V, Nerul, Navi Mumbai – 400706
Phone Number: 022-27708376/77/85

Abstract— The Development in Wireless and communication and mobile devices has bolstered the
improvement of speech recognition system. When we say speech recognition system two main
significant terms that comes are the pattern matching and the feature extraction. This paper denotes
and computes a simple algorithm using MATLAB to match the patterns to recognize speech using
cross correlation technique. Correlation is a statistical measure where you have to contrast two or
more signals to discover the similarity between them. Speech recognition which is a part of
biometrics has become one of the major aspect to provide security to the devices and applications.
Speech recognition is a concept where we extract the spoken words and match it with the sample
previously provided.
Keywords— MATLAB Programming, Speech Recognition, Biometrics, Isolated Word Recognition,
Mel frequency cepstral coefficients (MFCC), Correlation
I. INTRODUCTION
Speech Recognition is the way of capturing the talked words using a gadget and converting them into
a digitally stored set of words. Speech recognition is used in almost every security project where you
need to speak and tell your password to computer and is also used for automation. In the current
world, there is a continually expanding need to confirm and recognize the voice of individuals
automatically. For every individual securing the personal details from the theft is the national
priority. This paper tells about the concept Mel frequency cepstral coefficients (MFCCs) as the
feature for the recorded speech. [1].
Speech recognition is basically and widely used concept for providing the security to the
applications. Security has become a major part for any user using any smart devices. Speech
Recognition is one of part of Biometrics. Biometrics, the physical qualities and behavioral attributes
that make each of us exceptional, are a characteristic decision for personality confirmation. It is a
developing innovation that guarantees a viable answer for our security needs. We can utilize a
biometric to get to our home, our record, or to conjure an altered setting for any safe range or
application. In this section we investigate the different sorts of biometric confirmation systems and
their arrangement potential.
1.1. Biometrics
The term biometrics is presently generally known as "the art of measuring physical qualities, to
check a man's character‖, and have got from the Greek words bio (life) and metric (to quantify)
which includes speech recognition, iris and face scans, and fingerprint recognition.
Biometric qualities can be further separated in two principle classes:

 Physiological: This biometrics is the other sort utilized for distinguishing proof or check
purposes. Distinguishing proof alludes to figuring out who a man is. This technique is ordinarily
utilized as a part of criminal examinations.

@IJCTER-2017, All rights Reserved 82


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

 Behavioral: It is utilized for confirmation purposes. Check is deciding whether a man is who they
say they are. This strategy takes a Pattern at examples of how certain exercises are performed by
a person.

Figure 1. Biometrics Characteristics

1.2. History
In 1994, IBM organization was the first to introduce and commercialize the dictation feature which
was based on speech recognition. After that speech recognition has been introduced in many
different applications which include telephonic applications, Embedded Systems (Telephone Voice
Dialing System, Car Kit, and PDA), Multimedia applications like the language learning tools.
In the year 1960s and '70s, signature biometric concepts were produced, but yet the biometric field
quite stayed settled until the military and security offices enquired about and newly developed
biometric innovation fingerprinting.
‗Gunnar Fant‘ came with the new idea of the source-channel model of discourse generation and
marketed it in 1960, which turned out to be a valuable and ideal model of discourse creation. But
sadly, subsidizing at Bell Labs become scarce for quite a while when, in 1969, the powerful ‗John
Pierce‘ created an open letter that was incredibly format of acknowledgment research. ‗Pierce‘
defunded acknowledgment and examined at Bell Labs where no exploration on acknowledgment was
done until ‗Pierce‘ resigned and ‗James L. Flanagan‘ assumed control.
Further recently, ‗Raj Reddy‘ was the primary individual person to go up against ceaseless
acknowledgment as a graduate understudy at ‗Stanford University‘ in the late 1960s. Reddy's system
was intended to issue spoken commands for the game of chess which was played at the university.

II. LITERATURE REVIEW


2.1. Proposed System
The structure of proposed framework comprises of two modules to be specific, Speaker
Identification taken after by Speech Recognition [2]

2.1.1. Speaker Identification


Feature extraction is a procedure that concentrates information from the voice flag that is one of a
kind for every speaker. Mel Frequency Cepstral Coefficient (MFCC) method is regularly used to
make the unique finger impression of the sound documents. The MFCC depend on the known variety
of the human ear's basic data transfer capacity frequencies with channels separated straightly at low
frequencies and logarithmically at high frequencies used to catch the vital qualities of discourse.

@IJCTER-2017, All rights Reserved 83


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

These extricated components are Vector quantized utilizing Vector Quantization calculation. Vector
Quantization (VQ) is utilized for highlight extraction in both the preparation and testing stages. It is a
to a great degree effective portrayal of unearthly data in the discourse motion by mapping the vectors
from vast vector space to a limited number of districts in the space called clusters. [3]. after
component extraction, highlight coordinating includes the genuine technique to recognize the
obscure speaker by contrasting separated elements and the database utilizing the DISTMIN
calculation.

2.1.2. Speaker Recognition


Hidden Markov Processes are the measurable models in which one tries to describe the factual
properties of the flag with the fundamental supposition that a flag can be portrayed as an arbitrary
parametric flag of which the parameters can be evaluated in a précised and very much characterized
way. In order to develop an isolated word recognition system using HMM, the following measures
must be taken.
A. For each expressed word, a Markov show must be assembled utilizing parameters that upgrade
the perceptions of the word.
B. Most extreme probability model is ascertained for the expressed word. [4].
2.1.3. Limitations of Related works
 The above told methods were failed to discuss some of the general features, and characteristic
behavior of speech signals were not described properly.
 The above discussed some of the tasks of speech recognition can be easily challenged by highly
variant input speech signals.
 In the presence of some noise, the above discuss process of handling devices can be overruled.
 The above discussed methods, were failed to discuss about the use of filters for the noise
removal. [5].
III. METHADOLOGY
Speech recognition is widely and mostly used in almost every security project where machine can
recognize the person‘s voice as password to unlock it. For example, in user‘s daily life, if user want
to turn the Geyser on or off using the voice commands then Speech Recognition plays a vital role.
Application should understand the system and recognize the user commands ON or OFF.
Depending on limitations of other model, the technique called cross correlation for recognition of
speech is used and simulated in MATLAB. Correlation compares the two signals, considering the
five samples and comparing them with the test sample gives us the result. Every sound sample (test
or five samples) are in .wav format. To recognize the words from the sound the concept Mel
frequency cepstral coefficients (MFCCs) is used.
There are three main concept to understand before knowing further which are:
3.1. Speech Recognition
The nature of a speech recognition systems are evaluated by two elements: its accuracy (mistake rate
in changing over talked words to advanced information) and speed (how well the product can stay
aware of a human speaker). Speech recognition technology has unlimited applications. Generally,
such programming is utilized for programmed interpretations, correspondence, sans hands figuring,
restorative translation, mechanical autonomy, mechanized client administration, and a great deal

@IJCTER-2017, All rights Reserved 84


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

more. On the off chance that you have ever paid a bill via telephone using an automated system, you
have likely profit by speech recognition software.

Figure 2. Voice signal representing an utterance of the word "seven”

There are different modes available for Speech Recognition System:


1. Speaker Dependent / Independent System: It must be trained in order to recognize
accurately what has been said. To train a system, Speaker is asked to record predefined words
or sentences that will be analyzed and that results will be stored.
2. Isolated Word Recognition: It is Simplest mode and less greedy in terms of CPU
requirement. Word is surrounded by silence so that boundaries are well known.
3. Continuous Speech Recognition: It assumes that system is able to recognize a sequence of
words in a sentence.
4. Keyword Spotting: It is able to identify in a sentence a word corresponding to a particular
command. Created to cover the gap between isolated and continuous System.
5. Vocabulary Size: Larger the vocabulary the system can make more errors. So vocabulary
size matters.
The Speech Recognition Process can be divided into many different components which is shown in
the below diagram:
Speech

Feature Extraction

Probability
Estimation

Decoding

Language Models

Recognized Sentences
Figure 3. Speech Recognition Process

@IJCTER-2017, All rights Reserved 85


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

3.2. Correlation Technique


Cross-correlation is a measure of similarity of two series as a function of the displacement of one
relative to the other. This is also known as a sliding dot product or sliding inner-product. It is
commonly used for searching a long signal for a shorter, known feature. It has applications in pattern
recognition, single particle analysis. The term cross-correlation is utilized for alluding to the
relationships between the sections of two arbitrary vectors X and Y, while the connections of an
irregular vector X are thought to be simply the connections between simply the passages of X, those
shaping the connection lattice (network of connections) of X.
xcorr function of MATLAB is an Cross-correlation function for sequence for a random process
which includes autocorrelation.
Syntax for Correlation in MATLAB is derived as r = xcorr(x,y)
r = xcorr(x,y) returns the cross-correlation of two discrete-time sequences, x and y. Cross-correlation
measures the closeness amongst x and moved (slacked) duplicates of y as a component of the slack.
In the event that x and y have diverse lengths, the capacity annexes zeros toward the finish of the
shorter vector so it has a similar length, N, as the other.
3.3. Correlation Technique
It is the representation of the short-term power spectrum of a sound, based on the linear transform of
a log power spectrum. They are derived from a type of cepstral representation of the audio clip, this
concept use the .wav format in MATLAB. MFCCs are commonly used as benefit in speech
recognition systems which can automatically recognize the spoken words from the audio file.
MFCCs are found in use of audio information retrieval applications such as genre classification,
audio similarity measures. Its values are not very robust in the presence of additive noise, so it is
easy to normalize their values in speech recognition systems to reduce the influence of noise.
3.4. Algorithm for Speech Recognition using Correlation
Algorithm: function speechrecognition(filename)
Input: Upload 5 sample Files m1, m2, m3, m4, m5 and the test file.
Output: Correlation result of m and test file.
1: Consider sample as voice where x=voice
Read and compute x and store in y1
2: z1=xcorr(x.y1)
m1=max(z1)
l1=length(z1)
t1= -((l1-1)/2):1((l1-1)/2);
3: plot(t1,z1)
4: Repeat steps 1,2,3 for all 5 samples.
5: Consider a=[m1 m2 m3 m4 m5 m6] where m6=300
6: Compute m=max(a)
7: If m<=m1

@IJCTER-2017, All rights Reserved 86


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

read 1st file


elseif m<=m2
read 2nd file
elseif m<=m3
read 3rd file
elseif m<=m4
read 4th file
elseif m<=m5
read 5th file
else
read denied file
8: End
IV. RESULTS
Using MATLAB, the graphs for comparison between the test and sample audio files is derived. Two
test files and five sample files which has audio (Spoken word) of one to five is considered. One test
file is match from five sample files and another test file is the denied file which is not matched with
any sample files. When a test file is given as the input, the loop starts where first the spoken word
from the audio files are computed and correlated with each other and using MALAB the graph where
frequency of speech is displayed.
Let‘s consider the test.wav file which is match for the second sample. When the input
speechrecognition(‗test.wav‘) is given in MATLAB, the comparison will start.
The below are the graphs:

Test.wav vs one.wav Test.wav vs second.wav

@IJCTER-2017, All rights Reserved 87


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

Test.wav vs third.wav Test.wav vs fourth.wav

Test.wav vs fifth.wav
Figure 4. Success Results

Now consider the denied.wav file which is not a match with any sample given. When the given input
speechrecognition(‗denied.wav‘) in MATLAB command prompt, the comparison will start and it
will tell denied which means the file is not matched with any of the sample files.
The below are the graphs:

Denied.wav vs one.wav Denied.wav vs second.wav

Denied.wav vs third.wav Denied.wav vs fourth.wav

@IJCTER-2017, All rights Reserved 88


International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 03, Issue 06; June – 2017 [Online ISSN 2455–1392]

Denied.wav vs fifth.wav

Figure 5. Denied Results

When we see in the success result, second sample is the successful match so at coordinates (0, 0) the
words of audio file match which is seen in frequency format in graph.
V. CONCLUSION
This paper defines us successfully about various features, behavior and characteristics of speech
signals and also deals with the concept of cross correlation. In this paper, an algorithm has been
created with the help of MATLAB programming which requires .wav format speech input signals
where comparison with the test sound file using correlation technique takes place. Thus, paper
concludes that in order to remove the further limitation of audio formats there is a requirement for
the study of various formats of speech signals which will be further used for communication with the
machines which include the hardware part and not the simulator.

REFERENCES
[1] Automatic Speech Recognition using correlation analysis By Rajorshee Raha, Amab Pramanik
[2] An Enhanced Speech Recognition System By Suma Shankaranand, Manasa S, Mani Sharma, Nithya A.S, Roopa
K.S., K.V. Ramakrishnan, International Journal of Recent Development in Engineering and Technology,Volume 2,
Issue 3, March 2014.
[3] Mahdi Shaneh and Azizollah Taheri, ‖Voice Command Recognition System based on MFCC and VQ Algorithms‖,
World Academy of Science, Engineering and Technology Journal, 2009.
[4] Nikolai Shokhirev, ‖Hidden Markov Models ―, 2010.
[5] SPEECH RECOGNITION USING MATLAB By ASEEM SAXENA,AMIT KUMAR SINHA,SHASHANK
CHAKRAWARTI,SURABHI CHARU, International Journal of Advances In Computer Science and Cloud
Computing, ISSN: 2321-4058 Volume- 1, Issue- 2, Nov-2013.

@IJCTER-2017, All rights Reserved 89

You might also like