Voice (Speaker) Recognition Using Neural Networks: Synopsis
Voice (Speaker) Recognition Using Neural Networks: Synopsis
VOICE(SPEAKER) RECOGNITION
USING NEURAL NETWORKS
AIM : To implement neural networks and use them for Voice Recognition
(Identify a speaker from his/her voice sample)
DESCRIPTION:
Voice, or speaker, recognition is a biometric modality that uses an individual’s voice for
recognition purposes. (It is a different technology than “speech recognition”, which
recognizes words as they are articulated, which is not a biometric.) The speaker
recognition process relies on features influenced by both the physical structure of an
individual’s vocal tract and the behavioral characteristics of the individual.
The speech signal conveys information about the identity of the speaker. The area of
speaker identification is concerned with extracting the identity of the person speaking
the utterance. Two voices are compared and tested to see if they are of the same
speaker or different. The decision is made by the neural network.
1. Speaker verification -Verify that a given speaker is one who he claims to be.
System prompts the user who claims to be the speaker to provide ID. System verifies
user by comparing codebook of given speech utterance with that given by user. If it
matches the set threshold then the identity claim of the user is accepted otherwise
rejected.
2. Speaker identification - detects a particular speaker from a known population.
The system prompts the user to provide speech utterance. System identifies the user by
comparing the codebook of speech utterance with those of the stored in the database
and lists, which contain the most likely speakers, could have given that speech
utterance.
In our project, we aim to study the individual information included in the sound
waves by applying the information into the inputs of a neural network and identify
patterns (pattern matching) in order to verify the identity of the speaker, thereby
using Artificial Intelligence (AI) to spot differences in sound waves and decide
whether the speaker is really who he claims he is or to identify who the speaker really is
by comparing his sound waves with those stored in databases.
IMPLEMENTATION DETAILS:
Steps:
1. Record Speech
2. Filtering
3. Feature extraction
• It is a process of studying and deriving useful information from the filtered input
patterns.
4. Decision
• The MFCCs would be applied as input to the neural network. The neural
network used would be a multilayer feed-forward neural network.
The neural network would be self-learning and it would identify the common
patterns between same voices and return whether the person is really who he/she
claims to be or identify the person by comparing voice with database. The accuracy
given by the neural network would be proportional to the training involved.
Calculating MFCCs
1. Take the Fourier transform of (a windowed excerpt of) a signal (cleaned for
silence frames and disturbances using some filters).
2. Map the powers of the spectrum obtained above to the mel scale.
4. Take the discrete cosine transform of the list of mel log powers, as if it were a
signal.