0% found this document useful (0 votes)

52 views6 pages

Voice Recognition Based Security System Using

1) The document describes a voice recognition security system using a convolutional neural network that analyzes speech characteristics. 2) It uses scaled spectrograms and the Google Speech-to-Text API to convert speech to text for verification. 3) A convolutional neural network is implemented for voice identification and authentication, with each layer assigned a specific task in the network's training on multiple voice samples.

Uploaded by

ks09anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views6 pages

Voice Recognition Based Security System Using

Uploaded by

ks09anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)

Voice Recognition Based Security System Using

Convolutional Neural Network
2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) | 978-1-7281-8529-3/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICCCIS51004.2021.9397151

Pankaj H. Chandankhede Abhijit S. Titarmare

Department of Electronics & Telecommunication
Engineering Department of Electronics & Telecommunication
G H Raisoni College of Engineering, Nagpur, India Engineering
[email protected] G H Raisoni College of Engineering, Nagpur, India
[email protected]

Sarang Chauhvan
Department of Electronics & Telecommunication Engineering
G H Raisoni College of Engineering, Nagpur, India
[email protected]

Abstract— Following review depicts a unique speech should speak identical words each for recording and
recognition technique, based on planned analysis and recognizing sessions. In non-text related systems this example
utilization of Neural Network and Google API using speech’s is a recording and review session, both are entirely different [1,
characteristics. Multifactor security system pioneered for the 2]. The voices of every individual are easily distinguishable,
authentication of vocal modalities and identification. even will acknowledge one another at the phone. In voice
Undergone project drives completely unique strategy of recognizing gathering the options of the voice is vital.
independent convolution layers structure and involvement of As per considered perusal and taken into account system is
totally unique convolutions includes spectrum and Mel- designed with the set of rules of speech, cryptography
frequency cepstral coefficient. This review takes in the breakthrough within the style of RGB (Red, Green, and Blue)
statistical analysis of sound using scaled up and scaled down spectrograms. Present Strategy working like substitution
spectrograms, conjointly by exploitation the Google Speech- addressing recognition of voice, supported the précising the
to-text API turns speech to pass code, it will be cross-verified Cryptography about system and one’s speech characteristics.
for extended security purpose. Our study reveals that the System is designed to plan usage of Convolutional Neural
incorporated methodology and the result provided elucidate Networks, implementing CNN is of bit a tough task since it has
the inclination of research in this area and encouraged us to out shown a surprising result and shown hostility to unknown
advance in this field. frequency elements which are not authorized to the system’s
database environment.
Keywords— MFCC (Mel-Frequency Cepstrum Coefficients), Convolutional Neural Networks undergoes the vigorous
CNN (Convolutional Neural Network), ANN (Artificial Neural training phase as it has to be feed with multiple voice modalities
Network), ASR (Artificial Speech Recognition), STT (Speech-to- and each layer has its own designated work assigned to it, the
Text), GUI (Graphic User Interface). planned structure of this architecture led to the usage of two
I. INTRODUCTION Convolution and one Fully connected layer, this whole software
program is covered in and written in the Python programming
Restricting to breach the Security and encompassing is supreme environment using Rasp - berry Pi, System undergoes dedicated
vital. Standard System carries the security like passcodes, Neural Network Structure.
finger scanning, scanning of palm, identity verification can
break simply. So upper mentioned systems can breach out by II. SPEECH RECOGNITION PROCESS
perceiving password of particular system, since every coin has
The Method of speech recognition [3] is complicated and a
two sides so, by known force or any other technique Palm or
cumbersome job. The subsequent figure shows the steps
finger scanning breaching the whole system can cause loss of
concerned within the method of speech recognition. Voice
confidential data loss. Proposed system is made in way to grips
recognition Technique mainly consists of 2 chief arms - Audio
with hand in hand security leading to safeguarding documents.
Classification Prediction, Feature Extraction and storage
Voice Recognition technique is supremely excellent amidst
Feature extraction process extracts vital quantity of knowledge
other techniques for security purposes. Since every person has
form a particular modality of voice, leading to show the
its own unique voice adding distinguishable features like
designated authorized speaker. MFCC is employed because
Frequency, Pitch, Amplitude etc. Voice recognition is a vital
Feature Extraction technique [4] is used during this project.
task in life sciences and is split into two groups: text-dependent
Audio Classification Prediction involves the particular
and text-independent. In text-dependent systems, the user
procedure where with the help of forceful Machine learning it

ISBN:
ISBN: 978-1-7281-8529-3/21/$31.00
XX-X-XX-XX-X/19/$31.00 ©2021©2021IEEE
IEEE 738
1

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 30,2021 at 03:51:10 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)

is used to spot the unknown speaker in order that the system basis of individual details found in audio files i.e. speaker wave
calculates the information losses, minimum the information files. this device makes it possible to use the speech of the
loss additional correct the system are, every speaker has its own speaker to check their individual identity and access to services
speaker id and knowledge set loaded within the backend of the like dialing of voice modality, mobile phone banking, accessing
system NN compares with knowledge set to its own intelligence the information, mail, voice, services related to information,
developed by the coaching and extract the precise output. lead region security management and isolated connections to
machines such as computer. These Automatic identification [7]
and verification techniques typically thought-about foremost
existing and cut-rate strategies to steer clear of from
Training Testing unapproved connect any common spot or machines practice.
Phase Alternatively, the substitute available in the devices makes it
Phase suitable as speaker recognition for security for a purpose. The
drawback of speaker recognition system is one that's frozen
within the Speech Signal’s study. An exceedingly fascinating
drawback is that in the analysis of the speech signal, and in that
what characteristics create it distinctive among different signals
and what makes one speech signal totally different from
another.
A. Mel-Frequency Cpestrum Coefficients
Feature Extraction is that the extraction of the most effective
constant illustration of voice signals so as to supply a higher
recognition performance. The Principal objective of feature
extraction is to extract characteristics from the speech signal
that comes out to be very distinctive to every individual which
can be additional accustomed helping in differentiating one
speaker from other. For successive section it's vital that this
section have sensible potency because it affects the behavior of
the system. The Characteristics of the Vocal tract is exclusive
Fig.1 Traning and Testing phases of System.
for every individual speaker thus the impulses of the vocal tract
response will be accustomed differentiate speakers which may
be obtained by applying algorithm of Mel-frequency cepstrum
III. SPEECH RECOGNITION PROCESS coefficients (MFCC) [5, 6]. MFCC depends upon recognition
of alteration of the key data measurement of the human ear with
frequency i.e. its supported perceptions of human hearing can’t
comprehend frequencies above 1 kHz. Filters are of two types
in MFCC and they are linearly separated at frequencies below
1000 frequencies and at exponents greater than 1000 Hz. The
overall procedure of the MFCC is shown in figure below.

Fig.2 Overall System Design Flow. Fig. 3: Block Diagram of MFCC.

Recognition of speakers is the method of mechanically

identifying the particular authorized person who speaks on the

739
2

A. Speech-to-text (STT) minimum to maximum complicated shapes. The contribution of

CNN in object recognition from image is concept behind
solving the speech recognition matter, an image in CNN shows
its architecture. Thus, the sound encoding using image is
included in our method. Generally, CNN consists of
Convolution layer, pooling layers and fully connected layers.
Convolution layers and pooling layers combine to make
internal structure while fully connected layers are liable for
class probability generation.

3.1 Convolutional Layer

Convolutional layers have neurons which are connected to
previous layer’s receptive field. The neurons in same future
map are equal in weights. CL consists a group of learnable
filters. When specific shape or colour blob occurs in the area
one specific filter gets activated. Every CL consists of multiple
filters and filter may have a set of learnable weights which
Fig.4 Block Diagram of Speech-to-text. corresponds to the neurons in previous layers. Small spatially
that can be extended along total depth of previous layers is
Speech recognition is a new technology that is majorly used to known as a filter (usually the filter sizes are 3x3, 5x5 and rarely
interpret audio wave files or spoken utterances into its 7x7).
corresponding text. The text can be in any terms, it may be
words or a sequence of words, it can also be symbols or
characters, a voice command such as audio wave files or direct
speech signals, also it might be sub-word units or phones, but
here we are simply translating the speech signals into its
corresponding text. There are numerous examples of ASR
(Artificial Speech Recognition) system. One of the best
examples is YouTube’s closed captioning, it uses the ASR
engine for the effective transcription of the speech signals even
the audio and the video clips too. Some more examples to it are
the voice mail service which needs transcription; it too contains
the ASR engine running in its backend. Earlier the prototype
which was used for the ASR system was the dictation system.
A dictation system is mainly a system where the words are
spoken or simply a speech signal is given and the corresponding
transcript of the speech signal is produced. A major use of ASR
(Artificial Speech Recognition) can be seen in some most
renowned systems like Google Assistant, Cortana, Siri, Alexa
and many more which uses ASR engines in their front end. This
Technology of ASR strictly translates spoken utterances into its
corresponding text. There are many benefits of having an
adequate ASR system. A lot of time is saved if an individual
gives voice signals which are then converted to its
corresponding text instead of wasting a lot of time in typing.
Technology these days has many interfaces which is why most
of the people are unable to cope up with it. Therefore, there is a
higher need of a developed and stably constructed ASR system
so that this system can be used by both the illiterate and literate
users so they can interact better with the growing technology.
A. Convolutional Neural Network (CNN)
The structure of visual cortex in human brain has pattern for
creation of Convolutional Neural Network [7, 8]. The pixel
arrangement in local area is for determining the identity of
shape and structure of object, thus CNN analyses the image Fig 5. Speech-to-text API flow chart.
with help of tiny local patterns and generates combination of

3740

receptive field this trend prefers stacked CLs eventually with

stride and uses PLs terribly often or discards them altogether

Fig. 7 Assumed network structure.

3.3 Back-Propagation
Fig. 6 Convolutional Neural Network Block Diagram.
The single evaluation is totally in step with the feed-forward
The number of variable functions used in image which we get neural network. The activations or the input data is moved to
from the quantity of learnable filters affects the depth of CL. the succeeding layers, now this activation function is applied to
Exactly one CL filter’s weight is used by each neuron in CL the scalar product which has been computed. Two or three fully
while many neurons use the equal weights. The neurons in CL connected layers are being assembled at the end of the layer. If
are classified into feature maps by the filters used. The local the gradient descent learning algorithm is to be used then the
area of neuron’s connectivity onto previous layer is specified gradient should be computed prior. In our project we have used
by neuron in CL. The size of all neurons in same CL is identical the typical back-propagation algorithm which comprises of two
to their receptive field. The reception field of this neuron is technical updates. The traditional back-propagation algorithm
formed by all connections of given neuron. The volume of calculates various biased derivatives of weights which belongs
receptive field is same as that of product of filter size of specific to neurons inside the similar ﬁlter. Thus, the derivatives of the
CL and depth of previous layer. The activation is calculated by loss functions with reference to the corresponding weights of
application of activation function over potential. Most of the neurons which belongs to the identical feature map are summed
time ramp function is used as activation function which is also up in conjunction. The neurons which are not filtered with the
called as ReLU unit. The specific features which matter in only max-pooling, the error of back propagating is routed only to
some part of image is also essential within rest of image. The those neurons. To speed up the back-propagation it is very
foremost important hyper-parameter which comes after amount regular to trace the indices of neurons during forward
of filter is Stride. The dimension of the receptive field of the propagation.
neuron and size of image is considered while determining the
stride. In case parameters are incorrectly chosen, the stride B. Hardware Desigm
should be changed or Zero padding should be applied in order
to normalize image with various shapes or to keep specific input REQUIRED HARDWARE COMPONENTS
size.
3.2 Pooling Layer 1) TIP 41 NPN transistors (3 no’s),
Pooling layer (PL) is an efficient way of nonlinear down 2) Raspberry Pi 3 B+ model
sampling. It’s the CL receptive field and stride; but, it is not 3) HDMI Monitor
adding any learnable parameter. PL is usually place after the CL 4) Condenser Mic
[13]. The receptive field of neuron in PL is 2 dimensional. Max- 5) Resistors (3 no’s – 100 ohms, 3nos – 1.8k ohms),
PL is that the most frequently used PL. Each neuron output the 6) LED’,
foremost of its receptive field. Usually, the stride is that the 7) Push button.
same to size of the receptive field. The receptive field don’t 8) Lamps (3 no’s) and holders (3 no’s),
overlap but touch. In most cases, stride and size of receptive 9) Power Supply (12 V, 1 A) and (5v, 2A),
field are 2x2. Max-PL amplifies the foremost present feature 10) Fly Back Diode (1N4007)
(pattern) of its receptive field and throws away the remainders. 11) Power Relay (3 no’s), 12V
The intuition is that, once a feature has been found, its rough 12) Solenoid Lock
location relative to alternative options is additional necessary
than its actual location. The PL is effectively reducing the
13) Acrylic sheet
spatial size of the representation and doesn’t add new 14) Jumper Wires.
parameters reducing them for latter layers, making the 15) SMPS (Switch Mode Power Supply) (12 V 1 A)
computation more feasible. Due to its destructiveness,
discarding 75% of input information just in case of a small 2x2

741
4

Fig.8 Schematic of the intermediate Circuit

Fig. 10 Complete Hardware Circuit

IV. CONCLUSION AND FUTURE SCOPE

Studies have found out that combination of Mel frequency and
Convolutional Neural Network provides the best accuracy and
most effective performance. It conjointly suggests that to get
satisfactory result that maximum of the epochs needs to be
increased as the information in the system will be increased
with the increasing number of speakers. Therefore, to attain the
higher performance, the speaker training sessions by giving the
voice samples to the system needs to be perennial therefore on
update the speaker specific codebooks within the information
because it is shown in psychophysical studies that there's a
likelihood that human speech features might vary over a amount
of 2-3 years. The current study, contains a detailed work and
study on MFCC, CNN and Speech-to-text which is also
Fig.9 PCB of Circuit accustomed to improve the effectiveness and accuracy of the
Connections – system to take care of background noise, laughter and atypical
sounds. Improvement will be obtained by increasing the
This is Driver board which Drives Relays and devices. reference information size. Conjointly if we are able to unite
Transistors bases are connected to Raspberry pi GPIO’s voice activation detection with this procedure, we are able to
through the 100-ohm resistor. perform speech recognition on live voices and speech.
Connectors that control the Output Devices through the code.
V. RESULTS
TIP 41 emitter is grounded and Collector are connected through
the Fly back diode so as to avoid the reverse current spike due This is the Graphic User interface being displayed on the HDMI
to uninterrupted power supply base is provided With VCC to Monitor.
drive the board. The Graphic User Interface Shows Unique Speaker ID, Output
Additional Coupling capacitor is used to smooth the Current Peripheral Devices, Lamps and lock State of being HIGH and
flow. Three diodes are connected parallel to see each transistor LOW.
is working properly. MFCC Spectrogram is also displayed with MFCC Graph, we
give the live speech with the microphone and according to
speech Spectrogram and Graph is displayed.
Additional Connections –
Unique Speaker ID is already Declared in the Training Period
The Collector is connected to the Relays and the Solenoid Lock to control the output device, in the above figure Live speech is
through the Connect taken and CNN predict the Unique ID given and control the
Relays are connected to Lamps which itself needs the AC 230 device as the per the command given to the Speaker.
v power supply. Here Lamp 1 is getting on displaying the Speaker ID – 0. Other
Solenoid lock is driven with the Dc voltage according to the two being in LOW state.
Comparison and operations performed.

742
5

[3] Chandankhede, Pankaj H., and M. M. Khanapurkar.

"Design of CAN-Based Enhanced Event Data Recorder and
Evidence Collecting System." In Proceedings of the
International Conference on Recent Cognizance in Wireless
Communication & Image Processing, pp. 115-122. Springer,
New Delhi, 2016.
[4] Titarmare, Abhijit S., Milind M. Khanapurkar, and Pankaj
H. Chandankhede. "Analysis of Traffic Flow at Intersection to
Avoid Accidents using Nagel-Schreckenlerg (NS) Model."
In 2020 Fourth International Conference on I-SMAC (IoT in
Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 478-484.
IEEE, 2020.
[5] L. Deng and X. Li, “Machine learning paradigms for speech
recognition: An overview,” IEEE Trans. Audio, Speech, Lang.
Fig. 11 Result Displayed with GUI (1) Process., vol. 21, no. 5, pp. 1060–1089, May 2013.
[6] G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton,
This is the Graphic User interface being displayed on the HDMI “Phone recognition with the mean-covariance restricted
Monitor Boltzmann machine,” Adv. Neural Inf. Process,2010.
The Graphic User Interface Shows Unique Speaker ID, Output [7] A. Bajpai, U. Varshney and D. Dubey, "Performance
Peripheral Devices, Lamps and lock State of being HIGH and Enhancement of Automatic Speech Recognition System using
LOW. Euclidean Distance Comparison and Artificial Neural
MFCC Spectrogram is also displayed with MFCC Graph, we Network," 2018 3rd International Conference On Internet of
give the live speech with the microphone and according to Things: Smart Innovation and Usages (IoT-SIU), Bhimtal,
speech Spectrogram and Graph is displayed. 2018, pp. 1-5, doi: 10.1109/IoT-SIU.2018.8519839.
[8] R. Jagiasi, S. Ghosalkar, P. Kulal and A. Bharambe, "CNN
A based speaker recognition in language and text-independent
small scale system," 2019 Third International conference on I-
SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-
SMAC), Palladam, India, 2019, pp. 176-179, doi: 10.1109/I-
SMAC47947.2019.9032667.
[9] A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G.
Hinton, and M. Picheny, “Deep belief networks using
discriminative features forphone recognition,” in Proc. IEEE
Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May
2011, pp. 5060–5063.
[10] Azarang, J. Hansen and N. Kehtarnavaz, "Combining Data
Augmentations for CNN-Based Voice Command
Fig. 11 Result Displayed with GUI (1) Recognition," 2019 12th International Conference on Human
Unique Speaker ID is already Declared in the Training Period System Interaction (HSI), Richmond, VA, USA, 2019, pp. 17-
to control the output device, in the Above figure Live speech is 21, doi: 10.1109/HSI47298.2019.8942638.
taken and CNN predict the Unique ID given and control the [11] D. Yu, L. Deng, and G. Dahl, “Roles of pre-training and
device as the per the command given to the Speaker. Here Lamp fine tuning in context-dependent DBN-HMMs for real-world
1 is getting on displaying the Speaker ID – 0. With the Speaker speech recognition” in Proc. NIPS Workshop Deep Learn.
ID 4 solenoid Lock gets Lock and Unlock. Unsupervised Feature Learn. 2010.
[12] A. Sokolov and A. V. Savchenko, "Voice command
REFERENCES
recognition in intelligent systems using deep neural networks,"
[1] H. Jiang, “Discriminative training for automatic speech 2019 IEEE 17th World Symposium on Applied Machine
recognition: A survey,” Comput. Speech, Lang., vol. 24, no. 4, Intelligence and Informatics (SAMI), Herlany, Slovakia, 2019,
pp. 589–608, 2010. doi: 10.1109/SAMI.2019.8782755.
[2] F. Akdeniz and Y. Becerikli, "Performance Comparison of [13] Salankar, Suresh S., and Balasaheb M. Patre. "SVM based
Support Vector Machine, K-Nearest-Neighbor, Artificial model as an optimal classifier for the classification of sonar
Neural Networks, and Recurrent Neural networks in Gender signals." International Journal of Computer, Information, and
Recognition from Voice Signals," 2019 3rd International Systems Science, and Engineering 1, no. 1 (2007): 68-76.
Symposium on Multidisciplinary Studies and Innovative
Technologies (ISMSIT), Ankara, Turkey, 2019, pp. 1-4, doi:
10.1109/ISMSIT.2019.8932818.

743
6

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 30,2021 at 03:51:10 UTC from IEEE Xplore. Restrictions apply.

A Review On Feature Extraction and Noise Reduction Technique
No ratings yet
A Review On Feature Extraction and Noise Reduction Technique
5 pages
Speaker Recognition Using MATLAB
95% (64)
Speaker Recognition Using MATLAB
75 pages
Speech Recognition Using Correlation Tec
No ratings yet
Speech Recognition Using Correlation Tec
8 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
A Voice Identification System Using Hidden Markov Model
No ratings yet
A Voice Identification System Using Hidden Markov Model
6 pages
Voice Recognition System Using Machine L
No ratings yet
Voice Recognition System Using Machine L
7 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
Voice (Speaker) Recognition Using Neural Networks: Synopsis
No ratings yet
Voice (Speaker) Recognition Using Neural Networks: Synopsis
4 pages
MS Thesis 459199
No ratings yet
MS Thesis 459199
60 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
2_CNN based speaker recognition in language and text independent small scale system
No ratings yet
2_CNN based speaker recognition in language and text independent small scale system
4 pages
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
100% (1)
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
18 pages
Speaker Recognition System Based On VQ in MATLAB Environment
No ratings yet
Speaker Recognition System Based On VQ in MATLAB Environment
8 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
Final Report Complete PDF
No ratings yet
Final Report Complete PDF
26 pages
Anabarasi (1) 11
No ratings yet
Anabarasi (1) 11
16 pages
Biometric Voice Recognition in Security System
No ratings yet
Biometric Voice Recognition in Security System
9 pages
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
No ratings yet
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
8 pages
voice syn -nn
No ratings yet
voice syn -nn
15 pages
Research Method and Presentation (Mini Project Proposal)
No ratings yet
Research Method and Presentation (Mini Project Proposal)
26 pages
Performance Improvement of Speaker Recognition System
No ratings yet
Performance Improvement of Speaker Recognition System
6 pages
Project Report: "In Pursuit of Global Competitiveness"
75% (4)
Project Report: "In Pursuit of Global Competitiveness"
9 pages
134 Rashid Bicet2021
No ratings yet
134 Rashid Bicet2021
9 pages
Fyp Final Poster
No ratings yet
Fyp Final Poster
1 page
Template
No ratings yet
Template
44 pages
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
No ratings yet
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
6 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
LPC and LPCC Method of Feature Extraction in Speech Recognition System
No ratings yet
LPC and LPCC Method of Feature Extraction in Speech Recognition System
5 pages
Isolated Speech Recognition Using Artificial Neural Networks
No ratings yet
Isolated Speech Recognition Using Artificial Neural Networks
5 pages
MFCC and Vector Quantization For Arabic Fricatives2012
No ratings yet
MFCC and Vector Quantization For Arabic Fricatives2012
6 pages
Major Project
No ratings yet
Major Project
22 pages
"Slang Detection Using Speech": Data Mining Techniques Minor Project Report On
No ratings yet
"Slang Detection Using Speech": Data Mining Techniques Minor Project Report On
15 pages
Thesis On Speaker Recognition System
100% (2)
Thesis On Speaker Recognition System
4 pages
Automatic+Speaker+Recognition+System - EEE
No ratings yet
Automatic+Speaker+Recognition+System - EEE
11 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Speaker Recognition PHD Thesis
100% (3)
Speaker Recognition PHD Thesis
5 pages
Performance Comparison of Robust Speech PDF
No ratings yet
Performance Comparison of Robust Speech PDF
6 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speech Recognition Using Artificial Neural Network: - A Review
100% (1)
Speech Recognition Using Artificial Neural Network: - A Review
4 pages
Speaker Verification For Remote Authentication
100% (2)
Speaker Verification For Remote Authentication
31 pages
Full Text 01
No ratings yet
Full Text 01
54 pages
Speaker Identification Using Mel Frequency Cepstral Coefficients
No ratings yet
Speaker Identification Using Mel Frequency Cepstral Coefficients
5 pages
SN Ka Thesis
No ratings yet
SN Ka Thesis
78 pages
Industrial Automation: Learn the current and leading-edge research on SCADA security
From Everand
Industrial Automation: Learn the current and leading-edge research on SCADA security
Vikalp Joshi
No ratings yet
Visualised Systems Engineering on Railway Projects
From Everand
Visualised Systems Engineering on Railway Projects
Jong-Pil Nam
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
RealityKit Development Essentials: Definitive Reference for Developers and Engineers
From Everand
RealityKit Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
Introduction To Augmented Reality Hardware: Augmented Reality Will Change The Way We Live Now: 1, #1
From Everand
Introduction To Augmented Reality Hardware: Augmented Reality Will Change The Way We Live Now: 1, #1
Kaviyaraj R
No ratings yet
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet
CX Engineering and Practice: Definitive Reference for Developers and Engineers
From Everand
CX Engineering and Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Video Data Analytics for Smart City Applications: Methods and Trends
From Everand
Video Data Analytics for Smart City Applications: Methods and Trends
PublishDrive
No ratings yet
Smart Camera: Revolutionizing Visual Perception with Computer Vision
From Everand
Smart Camera: Revolutionizing Visual Perception with Computer Vision
Fouad Sabry
No ratings yet
Project Proposal For Sinhala Language Processing
100% (4)
Project Proposal For Sinhala Language Processing
11 pages
Home Automation
No ratings yet
Home Automation
15 pages
Fiat 500 2008 Blue Me Kitabı
No ratings yet
Fiat 500 2008 Blue Me Kitabı
80 pages
Voice Operated Wheelchair
No ratings yet
Voice Operated Wheelchair
41 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
NLP Notes
No ratings yet
NLP Notes
6 pages
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
No ratings yet
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
5 pages
Virtual Assistant For The Blind
No ratings yet
Virtual Assistant For The Blind
7 pages
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
No ratings yet
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
4 pages
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
No ratings yet
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
78 pages
WP - AIMultiple - Voice AI
No ratings yet
WP - AIMultiple - Voice AI
29 pages
An AI Powered Voice Assistant For Enhanced User Interaction (Voice-Bot)
No ratings yet
An AI Powered Voice Assistant For Enhanced User Interaction (Voice-Bot)
4 pages
Impact of a i on Media Entertainment Industry
No ratings yet
Impact of a i on Media Entertainment Industry
32 pages
Indic Language Support
No ratings yet
Indic Language Support
7 pages
Openslr 126
No ratings yet
Openslr 126
1 page
Ans Key - 6 AI
No ratings yet
Ans Key - 6 AI
14 pages
Artificial Intelligence Based Farmer Assistant Chatbot
No ratings yet
Artificial Intelligence Based Farmer Assistant Chatbot
4 pages
JAWS (Screen Reader)
No ratings yet
JAWS (Screen Reader)
18 pages
03 Innovating With Google Cloud Artificial Intelligence
No ratings yet
03 Innovating With Google Cloud Artificial Intelligence
11 pages
Project Report On Simulation of Robotic Arm Using ROS and RViz
No ratings yet
Project Report On Simulation of Robotic Arm Using ROS and RViz
15 pages
Emergency Medical
No ratings yet
Emergency Medical
13 pages
Artificial Intelligence-Based Chatbot With Voice Assistance
No ratings yet
Artificial Intelligence-Based Chatbot With Voice Assistance
6 pages
Assistant in Python
100% (1)
Assistant in Python
16 pages
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
No ratings yet
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
16 pages
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
No ratings yet
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
5 pages
Wheelchair Important
No ratings yet
Wheelchair Important
71 pages
Automated Notes Maker From Audio Reccordings
No ratings yet
Automated Notes Maker From Audio Reccordings
4 pages
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
No ratings yet
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
15 pages
105 A Survey On Conversational Recommender Systems
No ratings yet
105 A Survey On Conversational Recommender Systems
35 pages
IEEE Conference Template 4
No ratings yet
IEEE Conference Template 4
3 pages

Voice Recognition Based Security System Using

Uploaded by

Voice Recognition Based Security System Using

Uploaded by

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)

Voice Recognition Based Security System Using

Pankaj H. Chandankhede Abhijit S. Titarmare

Fig.2 Overall System Design Flow. Fig. 3: Block Diagram of MFCC.

Recognition of speakers is the method of mechanically

A. Speech-to-text (STT) minimum to maximum complicated shapes. The contribution of

3.1 Convolutional Layer

receptive field this trend prefers stacked CLs eventually with

Fig. 7 Assumed network structure.

Fig.8 Schematic of the intermediate Circuit

IV. CONCLUSION AND FUTURE SCOPE

[3] Chandankhede, Pankaj H., and M. M. Khanapurkar.

You might also like