Speech Recognition As Emerging Revolutionary Technology
Speech Recognition As Emerging Revolutionary Technology
Abstract— Speech recognition is the new emerging technology in the field of computer and artificial intelligence. It
has changed the way we communicate with computer and other intelligent devices of same calibre like smart phones.
It is a major area of interest for research in this field which is related to artificial intelligence. In this paper the
overview of this technology and its current implementations are listed and introduced.
A. IWR: Isolated word recognition - Isolated word recognizers usually require each utterance to have quiet (lack of an
audio signal) on BOTH sides of the sample window. It doesn't mean that it accepts single words, but does require a
single utterance at a time. Often, these systems have "Listen/Not−Listen" states, where they require the speaker to
wait between utterances (usually doing processing during the pauses).Isolated Utterance might be a better name for
this class.
B. CWR: Connected word recognition - Connect word systems (or more correctly 'connected utterances') are similar to
Isolated words, but allow separate utterances to be 'run−together' with a minimal pause between them.
C. CSR: Continuous speech recognition - Continuous recognition is the next step. Recognizers with continuous speech
capabilities are some of the most difficult to create because they must utilize special methods to determine utterance
boundaries. Continuous speech recognizers allow users to speak almost naturally, while the computer determines the
content. Basically, it's computer dictation.
Basically, the microphone converts the voice to an analog signal. This is processed by the sound card in the computer,
which takes the signal to the digital stage. Input from user is also known as utterance (Spoken input from the user of
a speech application. An utterance may be a single word, an entire phrase, a sentence, or even several sentences.)[3]This
is the binary form of ―1s‖ and ―0s‖ that make up computer programming languages. Computers don’t ―hear‖ sounds in
any other way.
Sound-recognition software has acoustic models (An acoustic model is created by taking audio recordings of speech, and
their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It
is used by a speech recognition engine to recognize speech) [1] convert the voice sounds to one of about four dozen basic
speech elements (called phonemes). The latest versions of speech technology have been refined so that they eliminate the
noise and useless information that is not needed to let the computer work. The words we speak are transformed into
digital forms of the basic speech elements (phonemes).
Once this is complete, a second sector of the software begins to work. The language is compared to the digital
―dictionary‖ that is stored in computer memory. This is a large collection of words, usually more than 100,000. When it
finds a match based on the digital form it displays the words on the screen. This is the basic process for all speech
recognition systems and software. [2]
Both acoustic modelling and language modelling are important parts of modern statistically-based speech recognition
algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modelling has many other
applications such as smart keyboard and document classification.
Hidden Markov models- Modern general-purpose speech recognition systems are based on Hidden Markov Models.
These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition
because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-
scales (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov
model for much stochastic purposes.Another reason why HMMs are popular is because they can be trained automatically
and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a
sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every
VII. CONCLUSIONS
This paper introduces the basics of speech recognition technology and also highlights the difference between different
speech recognition systems. In this paper the most common algorithms which are used to do speech recognition are also
discussed along with the current and its future use.
ACKNOWLEDGMENT
The author expresses appreciation to Er. Bhupinder Singh for his extensive support.
REFERENCES
[1] en.wikipedia.org/wiki/Acoustic_model
[2] www.thegeminigeek.com/how-speech-recognition-works.
[3] www.lumenvox.com/resources/tips/tipsGlossary.aspx.
[4] Lawrence R. Rabiner,AT&T Labs Florham Park, New Jersey07932,APPLICATIONS OF SPEECH
RECOGNITION IN THE AREA OF TELECOMMUNICATI0NS,1997 IEEE.
[5] Halimah B.Z.Dep. of Info. Science,UKM, Selangor, [email protected],Azlina A.Dep. of Indus.
Comp.UKM, Selangor,[email protected] Behrang P. Dep. of Info. Science, UKM, Selangor,
[email protected] Choo W.O.UTAR, Kampar,Perak, [email protected]
Voice Recognition System for the Visually Impaired: Virtual Cognitive Approach, IEEE2008
[6] Xinxin Wang1,Feiran Wu1,Zhiqian Ye11College of Biomedical Engineering& Instrument Science, Zhejiang
University, Hangzhou, China [email protected] com, yezhiqian@hzcnc, The Application of Speech
Recognition in Radiology Information System,IEEE2010.
[7] Huu-Cong Nguyen, Shim-Byoung , Chang-Hak Kang, Dong-Jun Park and Sung-Hyun Han Division of Mechanical
System Eng., Graduate School, Kyungnam University, Masan, Korea Integration of Robust Voice Recognition and
Navigation System on Mobile Robot, ICROS-SICE International Joint Conference 2009
[8] X., Huang, A., Acero, and H.W., Hon, ―Spoken Language Processing: A Guide to Theory, Algorithm, and System
Development‖. Prentice Hall, Upper Saddle River, NJ, USA, 2001.
[9] M., Ursin, ―Triphone Clustering in Finnish Continuous Speech Recognition‖. Master Thesis, Department of
Computer Science, Helsinki University of Technology, Finland, 2002.
[10] O. Khalifa, S. Khan, M.R. Islam, M. Faizal and D. Dol, ―Text Independent Automatic Speaker Recognition‖, 3rd
International Conference on Electrical & Computer Engineering,, Dhaka, Bangladesh, 28-30 December 2004, pp.
561-564.
[11] C.R. Buchanan, ―Informatics Research Proposal – Modeling the Semantics of Sound‖, School of Informatics,
University of Edinburgh, United Kingdom, March 2005.
[12] http://ozanmut.sitemynet.com/asr.htm, Retrieved in November 2005.
[13] X., Huang, A., Acero, and H.W., Hon, ―Spoken Language Processing: A Guide to Theory, Algorithm, and System
Development‖. Prentice Hall, Upper Saddle River, NJ, USA, 2001.
[14] Y. Linde, A. Buzo, and R. M. Gray, ―An Algorithm for Vector Quantizer Design‖, IEEE Transactions on
Communications, VOL. COM-28, No. 1, pp. 84 - 95, January 1980.
[15] D., Jurafsky, ―Speech Recognition and Synthesis: Acoustic Modeling‖, winter 2005.
[16] S.K., Podder, ―Segment-based Stochastic Modelings for Speech Recognition‖. PhD Thesis. Department of
Electrical and Electronic Engineering, Ehime University, Matsuyama 790-77, Japan, 1997.
[17] S.M., Ahadi, H., Sheikhzadeh, R.L., Brennan, and G.H., Freeman, ―An Efficient Front-End for Automatic Speech
Recognition‖. IEEE International Conference on Electronics, Circuits and Systems (ICECS2003), Sharjah, United
Arab Emirates, 2003.
[18] M., Jackson, ―Automatic Speech Recognition: Human Computer Interface for Kinyarwanda Language‖. Master
Thesis, Faculty ofComputing and Information Technology, Makerere University, 2005.
[19] M.R., Hasan, M., Jamil, and M.G., Saifur Rahman, ―Speaker Identification Using Mel Frequency Cepstral
Coefficients‖. 3rd International Conference on Electrical and Computer Engineering, Dhaka, Bangladesh, 2004, pp.
565-568.
M.Z., Bhotto and M.R., Amin, ―Bangali Text Dependent Speaker Identification Using MelFrequency Cepstrum
Coefficient and VectorQuantization‖. 3rd International Conference on Electrical and Computer Engineering, Dhaka,
Bangladesh, 2004, pp. 569-572.