Study of Speaker Verification Methods
Study of Speaker Verification Methods
Volume: 2 Issue: 8
ISSN: 2321-8169
2363 2367
_______________________________________________________________________________________________
Dr. V.V.Patil
Abstract Speaker verification is a process to accept or reject the identity claim of a speaker by comparing a set of measurements of the
speakers utterances with a reference set of measurements of the utterance of the person whose identity is claimed.. In speaker verification, a
person makes an identity claim. There are two main stages in this technique, feature extraction and feature matching. Feature extraction is the
process in which we extract some useful data which can later to be used to represent the speaker. Feature matching involves identification of the
unknown speaker by comparing the feature extracted from the voice with the enrolled voices of known speakers.
Keywords- speaker verification, text dependent, text independent
__________________________________________________*****_________________________________________________
I.
INTRODUCTION
_______________________________________________________________________________________
ISSN: 2321-8169
2363 2367
_______________________________________________________________________________________________
band speech contains 16 band pass filters spaced uniformly
500 Hz apart. The output of each filter is usually implemented
as a windowed, short-time Fourier transform [using fast
Fourier transform (FFT) techniques] at the center frequency of
the filter. LPC-based spectral analysis is widely used for
speech and speaker recognition. The LPC model of the speech
air stream to the larynx. Larynx refers as an energy signal
specifies that a speech sample at time t ,s .t/, can be
represented as a linear sum of the p previous samples plus an
excitation term, as follows:
S(t)= a1s(t 1) +a2s(t 2)+.ap(t p)+ G u(t)
The LPC coefficients ai are computed by solving a set of linear
equations resulting from the minimization of the mean-squared
error between the signal at time t and the linearly predicted
estimate of the signal. Two generally used methods for solving
the equations, the autocorrelation method and the covariance
method. [4]
III.
SPEAKER RECOGNITION
A. Speaker Identification
Speaker identification (SI) is the process of finding the identity
of an unknown speaker by comparing his/her voice with
voices of registered speakers in the database. Its a one-tomany comparison. The basic structure of SI system (SIS) is
shown in Figure. We notice that the core components in SIS
are the same as in SVS. In SIS, M speaker models are scored
in parallel and the most-likely one is reported. The core
components in SIS are the same as in SVS. In SIS, M speaker
models are scored in parallel and the most-likely one is
reported, and consequently decision will be one of the
speakers ID in the database, or will be none of the above if
and only if the matching score is below some threshold and
its in the case of a open-set SIS.
_______________________________________________________________________________________
ISSN: 2321-8169
2363 2367
_______________________________________________________________________________________________
speaker, the difference can also be used as additional
individual information. This helps to increase performance.
The most common approach to automatic speaker recognition
in the text-dependent mode uses representations that preserve
temporal characteristics. Each speaker is represented by a
sequence of feature vectors (generally, short-term spectral
feature vectors), analyzed for each test word or phrase. This
approach is usually based on template matching techniques in
which the time axes of an input speech sample and each
reference template of registered speakers are aligned, and the
similarity between them accumulated from the beginning to
the end of the utterance is calculated. Trial-to-trial timing
variations of utterances of the same talker, both local and
overall, can be normalized by aligning the analyzed feature
vector sequence of a test utterance to the template feature
vector sequence using a dynamic programming (DP) time
warping algorithm or DTW. Since the sequence of phonetic
events is the same for training and testing, there is an overall
similarity among these sequences of feature vectors. Ideally
the intra-speaker differences are significantly smaller than the
inter-speaker differences.
B. Text Independent (No Specified Passwords)
There are several applications in which predetermined
passwords cannot be used. In addition, human beings can
recognize speakers irrespective of the content of the utterance.
Therefore, text-independent methods have recently been
actively investigated. Another advantage of text-independent
recognition is that it can be done sequentially, until a desired
significance level is reached, without the annoyance of having
to repeat passwords again and again. In a text-independent
system, the words or phrases used in recognition trials
generally cannot be predicted. Therefore, it is impossible to
model or match speech events at the level of words or phrases.
Classical text-independent speaker recognition techniques are
based on measurements for which the time dimension is
collapsed. Recently text-independent speaker verification
techniques based on short duration speech events have been
studied. The new approaches extract and measure salient
acoustic and phonetic events. The bases for these approaches
lie in statistical techniques for extracting and modeling
reduced sets of optimally representative feature vectors or
feature vector sequences or segments. These techniques fall
under the related categories of vector quantization (VQ),
matrix and segment quantization, probabilistic mixture
models, and HMM.
A set of short-term training feature vectors of a speaker can be
used directly to represent the essential characteristics of that
speaker. However, such a direct representation is impractical
when the number of training vectors is large, since the
memory and amount of computation required become
prohibitively large. Therefore, efficient ways of compressing
APPLICATION
_______________________________________________________________________________________
ISSN: 2321-8169
2363 2367
_______________________________________________________________________________________________
transactions including a second and third party (i.e. the so
called high-risk bank transactions). It is always noted that the
security measures should be proportional to the value that
could be obtained by this service.
[3]
[4]
C. Home shopping
Home shopping (see for e.g. http://www.hsn.com) is the
service that is most uninteresting to an imposter. SV is here
being employed, though backed up by a human operator. In
this service people ring to order products that are later on
shipped to their home addresses. In cases when all lines are
busy, a customer can always choose to use the automatic
service. They just have to speak their telephone number and if
their identity is successfully verified they can start ordering
products. If they are rejected, they are redirected to a human
operator. But even if their identity is mistaken for someone
else and some products are sending to another customer, there
is no harm because these products cannot go to an
unauthorized party (i.e. a criminal).
[5]
[6]
[7]
DISCUSSION
[2]
REFERENCES
Matsui, T. and Furui, S., Concatenated phoneme
models for text-variable speaker recognition, Proc.
IEEE Intl. Conf. Acoust., Speech, Signal Processing,
II, 391394, 1992.
Matsui, T. and Furui, S., Speaker adaptation of tiedmixture-based phoneme models for text prompted
speaker recognition, Proc. IEEE Intl. Conf. Acoust.,
Speech, Signal Processing, I, 125128, 1994
2366
_______________________________________________________________________________________