Journal 1
Journal 1
Corresponding Author:
Tobiloba Emmanuel Somefun
Department of Electrical and Information Engineering
Covenant University
Canaan Land, KM 10, Idiroko Rood, P. M. B. 1023, Ota, Ogun State, Nigeria
Email: [email protected]
1. INTRODUCTION
Speaking is the major means of communication by a human. There are a lot of processes involved in
the production of speech. Also, there are several body parts that aid in the production of speech, apart from
the commonly known body parts such as tongue, mouth and lips. The lungs, trachea, larynx, vocal cord, oral
cavity and nasal cavity are highly involved [1, 2]. Human speech is produced by the flow of air from the lungs
through the larynx. It is produced by inhaling and exhaling through the nasal and oral cavity. Vowel sounds are
produced by the flow of air from the lungs through the vocal cord, making them vibrate [3]. Consonants can
be produced when the air is pressed through the closed vocal resulting in turbulent airflow. Due to the
vibration of the vocal cords, sounds can be produced [4]. Each sound, word or speech vibrates differently.
The frequency of the vibration is called pitch. In reference [5], the source-filter theory of speech production
was introduced, which explains how speech is produced. According to [5], speech production is in two
stages. In the first stage, air flows through the vocal cords to produce a basic signal. This basic signal is
known as the signal source.
The recognition of the speaker is the process of recognising a speaker from the unique information,
which is present in the wave of the word. This technique uses the speaker's voice to check the identity of the
rapporteur and control access to services such as composition from voice, security, information service,
remote access to a computer, purchases, etc. A lot of handicap (blind, lame) and aged persons in society have
a limited capacity to perform certain tasks due to their physical and environmental conditions [6, 7]. Most
often they require human help in several of their activities which usually cost a huge sum if the person is not
their family member and persons who render such services are very minimal [8, 9]. This work seeks to help
the physically challenged or disabled individual to perform the most basic tasks, such as opening doors,
turning on/off electrical devices, calling a mobile line, automated activities and much more through the use of voice.
It is like a telecommunication service that aids attention to the need of the disabled via automation [10, 11].
With the recent trend in automation as a means of control systems in different areas [12-16], this work deems
it fit to integrate automation to meet some of the needs of the disabled individual. The proposed model in this
study is limited to the sound or speech recognition mode of authentication. Although there is another authentication
mode to gain access such RFID [17-19], biometrics [20-22], PIN [23, 24] and or a combination [25-27], this
study focuses on the voice recognition.
3. DESIGN SPECIFICATIONS
The design specifications of the speech recognition for access control module deals with
the conditions necessary for the functionality of the module optimally. For this work, two types of design
specifications would be considered namely: hardware and software specifications
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1796 - 1803
Int J Elec & Comp Eng ISSN: 2088-8708 1799
where index ‘n’ is referred to as time nT, which means that Xn(ω)=(nT, w). By inverse Fourier transform,
𝑥(𝑡) (speech signal) is recovered as shown in e (2):
1 𝜋
𝑋𝑛 = ∫ 𝑋 (𝑤) 𝑒 𝑗𝜔𝑛 𝑑𝑤
𝑊𝑂 2𝜋 −𝜋 𝑛
(2)
Since 𝑋𝑛(𝜔) is a function of time and it changes as time changes, it is sampled at a rate that allows
the speech 𝑥(𝑡) to be reconstructed. With the bandwidth, Bx of the speech signal 𝑥(𝑡) being approximately
equal to 5kHz, the sampling rate frequency (Fs) is therefore equals 10 kHz. For the Hamming window of
length N=100, (wn), using (3), the bandwidth B is found to be
2𝐹𝑆
𝐵= (3)
𝑁
20,000
𝐵= = 200𝐻𝑧
100
The Nyquist rate for the short-time Fourier transform is twice the the bandwidth B, therefore, the
Nyquist rate equals 400 Hz. Hence, at Fs equals 10,000, it requires a value of 𝑋𝑛(𝜔𝑘) every 25 samples.
Since N=100, the windows should overlap by 75% [28].
Multivariate data are observation that are made on more than one variable, In (3) 𝑃(𝑥) is called
probability density function formula, 𝜇 is the mean vector (𝑑 × 1 𝑚𝑎𝑡𝑟𝑖𝑥) and ∑ 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 [ 𝑑 ×
𝑑 𝑚𝑎𝑡𝑟𝑖𝑥] of the normally distributed random variable X. In (4) the mean vector (expected vector) is as
shown in (5):
∞
𝜇 ≜ 𝐹(𝑥) ≜ ∫−∞ 𝑥𝑃(𝑥) 𝑑(𝑥) (5)
where the number of samples is 𝑁 and 𝑋𝑖 are the mel-cepstral feature vector.
The expression for variance-covariance matrix of a multi-dimensional random variance is described in (7):
1 1
∑= ∑𝑁=1 𝑇
𝑖=1 (𝑋𝑖 − 𝜇)(𝑋𝑖 − 𝜇) = [𝑆𝑥𝑥 − 𝑁(𝜇𝜇)𝑇 ] (7)
𝑁−1 𝑁−1
where the sample mean 𝜇 is obtained from (5) and the second order sum matrix 𝑆𝑥𝑥 as shown in (8) [28].
𝑇
𝑆𝑥𝑥 = ∑𝑁−1
𝑖=1 𝑋𝑖 𝑋𝑖 (8)
when the preparation information is prepared and the reason for an independent model that is saved as the
previous statistics is assembled, many speakers are used to advance the Gaussian parameters and coefficients,
using standard procedures, for example, maximum likelihood estimation (MLE), maximum posterior
regulation (MPR) and maximum likelihood linear regression (MLLR). Now, the frame is ready to play the
enlistment. Enrollment can be completed by taking an example of an objective voice sound and adjusting it
so that it is ideal for adjusting this example. This guarantees that the probabilities returned when coordinating
a similar example with the adapted model would be maximum.
*𝑄 = 𝑞1 𝑞2 … … . 𝑞𝑛 a set of N state
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1796 - 1803
Int J Elec & Comp Eng ISSN: 2088-8708 1801
∑𝑛𝑗=1 𝑎𝑖𝑗 = 1 ∀𝑖
∑𝑛𝑖=1 𝜋𝑖 = 1 ∀𝑖
The probability that a Markov chain will begin in state 𝑖 is 𝜋𝑖. The flow chart for the procee is given
in Figure 5.
Figure 6. Result of tests carried out on same speaker under different conditions
where
A = A condition of crowded place with background noise;
B = A condition of silent place with little or no background noise;
C = A condition such that the speaker’s voice was low; and
D = A condition such that the speaker’s voice was loud
Table 1 shows the accuracy of samples taken.
Int J Elec & Comp Eng, Vol. 11, No. 2, April 2021 : 1796 - 1803
Int J Elec & Comp Eng ISSN: 2088-8708 1803
ACKNOWLEDGEMENTS
The authors acknowledge Covenant University for her financial support.
REFERENCES
[1] D. Blischak, et al., "Use of speech-generating devices: In support of natural speech," Augmentative and alternative
communication, vol. 19, no. 1, pp. 29-35, 2003.
[2] M. Mills, "Aid for speech therapy and a method of making same," ed: Google Patents, 1984.
[3] K. N. Stevens and A. S. House, "An acoustical theory of vowel production and some of its implications," Journal of
Speech and Hearing Research, vol. 4, pp. 303-320, 1961.
[4] K. Nishikawa, et al., "Speech planning of an anthropomorphic talking robot for consonant sounds production," in
Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Washington,
DC, USA, vol. 2, 2002, pp. 1830-1835.
[5] G. Fant, "The source filter concept in voice production," STL-QPSR, vol. 1, pp. 21-37, 1981.
[6] A. Ismail, S. Abdlerazek, and I. M. El-Henawy, "Development of Smart Healthcare System Based on Speech
Recognition Using Support Vector Machine and Dynamic Time Warping," Sustainability, vol. 12, no. 6, p. 2403, 2020.
[7] R. Gonzalez, et al,, "Voice Recognition System to Support Learning Platforms Oriented to People with Visual
Disabilities," in International Conference on Universal Access in Human-Computer Interaction, 2016, pp. 65-72.
[8] T. Gomi and A. Griffith, "Developing intelligent wheelchairs for the handicapped," in Assistive Technology and
Artificial Intelligence, Springer, pp. 150-178, 1998.
[9] R. C. Handel, "The role of the advocate in securing the handicapped child's right to an effective minimal
education," Ohio State University, vol. 36, p. 349, 1975.
[10] T. E. Somefun, C. O. A. Awosope, and C. Sika, "Development of a research project repository," TELKOMNIKA
Telecommunication, Computing, Electronics and Control, vol. 18, no. 1, pp. 156-165, 2020.
[11] A. Ademola, T. Somefun, A. Agbetuyi, and A. Olufayo, "Web based fingerprint roll call attendance management
system," International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 5, pp. 4364-4371, 2019.
[12] Y. Yamazaki and J. Maeda, "The SMART system: an integrated application of automation and information
technology in production process," Computers in Industry, vol. 35, no. 1, pp. 87-99, 1998.
[13] L. Kocúrová, I. S. Balogh, and V. Andruch, "Solvent microextraction: a review of recent efforts at automation,"
Microchemical Journal, vol. 110, pp. 599-607, 2013.
[14] S. E. Shladover and C. Systematics, "Recent international activity in cooperative vehicle-highway automation
systems," United States. Federal Highway Administration. Office of Corporate Research, pp. 1-95, 2012.
[15] C. von Altrock and J. Gebhardt, "Recent successful fuzzy logic applications in industrial automation," in
Proceedings of IEEE 5th International Fuzzy Systems, New Orleans, LA, USA, vol. 3, pp. 1845-1851, 1996.
[16] L. Kamelia, S. A. Noorhassan, M. Sanjaya, and W. E. Mulyana, "Door-automation system using bluetooth-based
android for mobile phone," ARPN Journal of Engineering and Applied Sciences, vol. 9, no. 10, pp. 1759-1762, 2014.
[17] A. Abdulkareem, I. U. Dike, and F. Olowononi, "Development of a radio frequency identification based attendance
management application with a pictorial database framework," International Journal of Research in Information
Technology (IJRIT), vol. 2, no. 4, pp. 621-628, 2014.
[18] A. Juels, "RFID security and privacy: A research survey," IEEE journal on selected areas in communications,
vol. 24, no. 2, pp. 381-394, 2006.
[19] A. Abdulkareem, C. Awosope, and A. Tope-Ojo, "Development and implementation of a miniature RFID system in
a shopping mall environment," International Journal of Electrical and Computer Engineering (IJECE), vol. 9,
no. 2, pp. 1374-1378, 2019.
[20] M. Lourde and D. Khosla, "Fingerprint Identification in Biometric SecuritySystems," International Journal of
Computer and Electrical Engineering, vol. 2, no. 5, pp. 852-855, 2010.
[21] D. Bhattacharyya, R. Ranjan, F. Alisherov, and M. Choi, "Biometric authentication: A review," International
Journal of u-and e-Service, Science and Technology, vol. 2, no. 3, pp. 13-28, 2009.
[22] N. L. Clarke, S. M. Furnell, and P. L. Reynolds, "Biometric authentication for mobile devices," IEEE Security &
Privacy, vol. 13, pp. 70-73, 2015.
[23] T. Van Nguyen, N. Sae-Bae, and N. Memon, "DRAW-A-PIN: Authentication using finger-drawn PIN on touch
devices," computers & security, vol. 66, pp. 115-128, 2017.
[24] J. Saville, "Authentication of PIN-Less Transactions," Google Patents, 2008.
[25] W. Shatford, "Biometric based authentication system with random generated PIN," Google Patents, 2006.
[26] F. Okumura, A. Kubota, Y. Hatori, K. Matsuo, M. Hashimoto, and A. Koike, "A study on biometric authentication
based on arm sweep action with acceleration sensor," in 2006 International Symposium on Intelligent Signal
Processing and Communications, Tottori, 2006, pp. 219-222.
[27] Y. Li, J. Yang, M. Xie, D. Carlson, H. G. Jang, and J. Bian, "Comparison of PIN-and pattern-based behavioral
biometric authentication on mobile devices," in MILCOM 2015-2015 IEEE Military Communications Conference,
Tampa, FL, 2015, pp. 1317-1322.
[28] S. E. Levinson, "Mathematical models for speech technology," Wiley Online Library, 2005.
[29] S. K. Patel, J. M. Dhodiya, and D. C. Joshi, "Mathematical Model Based on Human Speech Recognition and Body
Recognition," International Journal of Engineering Research & Technology (IJERT), vol. 1, no. 4, pp. 1-5, 2012.
[30] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings
of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.