0% found this document useful (0 votes)
4 views

Lecture 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

HMMs for Acoustic Modeling

(Part I)
Lecture 2

CS 753
Instructor: Preethi Jyothi
Recall: Statistical ASR

Let O be a sequence of acoustic features corresponding to a speech signal.


d
That is, O = {O1, …, OT}, where Oi ∈ ℝ refers to a d-dimensional
acoustic feature vector and T is the length of the sequence.

Let W denote a word sequence. An ASR decoder solves the foll. problem:

W* = arg max Pr(W | O) Language


W Model
= arg max Pr(O | W) Pr(W)
W
Acoustic
Model
St+1
Isolated word recognition
Trt a11 a22 a33

a01 a12 a23 a34


Pr(O | "up")
up
Pht+1 0 1 2 3 4

St+1 b1( ) b2( ) b3( )


Ot+1
O1 O2 O3 O4 .... OT
Trt a11 a22 a33
Figure 2.1: Standard topology used to represent a phone HMM.

Pr(O | "down")
a01 a12 a23 a34
down
Pht+1 0 1 2 3 4
sub-word units Q corresponding to the word sequence W and the language model

b ()
P (W ) provides a prior
1 probability for2 W . b () b3( )
St+1
Ot+1
O O O O .... O
Compute arg max Pr(O | w)
Acoustic model:
1 The most2commonly3 used acoustic
4 models in ASR
T systems to-
day are Hidden Markov Models (HMMs). Please refer to Rabiner (1989) for a com-
Trt prehensive tutorial of a a
11HMMs and their22applicability to ASR a
33 in the 1980’s (with acoustic w
Figure 2.1: Standard topology used to represent a phone HMM.
ideas that are largely applicable to systems today). HMMs are used to build prob- features
a a a a
left abilistic models01 12 labeling problems.
23 Since speech 34 is represented O
Pr(O | "left")
for linear sequence
1 2 3
Pht+1 0
sub-word units Q corresponding to the word sequence W and the language model 4
in the form of a sequence of acoustic vectors O, it lends itself to be naturally mod-
P (W ) provides a prior probability for W .

St+1
eled using HMMs. 1 b () 2 b () 3 b ()
j
Themodel:
Acoustic HMM isThe
defined
mostby specifyingused
commonly transition probabilities
acoustic models in (a i ) and
ASR observation
systems to-
Ot+1
day(or
areemission)
O
1
probability
Hidden Markov Models
2 O
distributions
(HMMs). Please
O
(b3j (Oi ))refer
(along O
to Rabiner
....
4 with the number
T O
(1989) for aofcom-
hidden

Trt states intutorial


prehensive the HMM). An HMM
of HMMs makes
and their a transitiontofrom
applicability ASRstate i to1980’s
in the state (with
j with a
a11 a22 a33
probability
ideas of aji2.1:
that areFigure
largely . applicable
On reaching
Standard to a stateused
systems
topology the
j,today).observation
HMMsa are
to represent vector
used
phone at build
to
HMM. that state
prob-(Oj )

a
abilistic models for01
linear sequence labeling20
a12 problems. Since speech is represented
a23 a34
right
Pht+1
insub-word
0
the form of a sequence
units
1
of acoustictovectors
Q corresponding
2
O, it
the word lends itself
sequence
3
to bethe
W and naturally
languagemod-
model
4
eled using
P (W HMMs.a prior probability for W .
) provides
b1( ) b2( ) b3( j)

Pr(O | "right")
The HMM is defined by specifying transition probabilities (ai ) and observation
Ot+1 Acoustic
(or model:
emission) O The most
probability
1
commonly
O2
distributions used
(bj (O O3 O4 ....
acoustic models in ASR systems to-
i )) (along with the number of hidden OT
day are
states Hidden
in the Markov
HMM). Models
An HMM (HMMs).
makes Please refer
a transition fromtostate
Rabiner
i to(1989)
state jfor a com-
with a
prehensiveoftutorial
probability aji . On of HMMsaand
reaching statetheir applicability
j, the observationtovector
ASR inat the
that1980’s (with
state (O j)
Figure 2.1: Standard topology used to represent a phone HMM.
ideas that are largely applicable to systems today). HMMs are used to build prob-
What are Hidden Markov Models (HMMs)?

Following slides contain figures/material from “Hidden Markov Models”,


“Speech and Language Processing”, D. Jurafsky and J. H. Martin, 2019.
(https://web.stanford.edu/~jurafsky/slp3/A.pdf)
future except via the current state. It’s as if to predict tomorrow’s weath
.6 .3 .6 .1 .6
examine today’s weather but you weren’t allowed to look at yesterday’
Markov Chains
(a) (b)
Figure A.1 A Markov chain for weather .8 (a) and one for words (b), showing s
are
2 A PPENDIX A • H IDDEN M ARKOV M ODELS
transitions. A start .1 distribution p is required; setting
.1 COLD .1 p =.4 [0.1, 0.7, 0.2]
.5 for (a) woul
.5
2
probability 0.7 of starting in state 2 (cold),
state must sum to 1. Figure A.1b shows a Markov
probability 0.1 of starting
.1 chain for assigning a probabil-
in state 1 (hot)
.5
ity to a sequence of words w1 ...w.3
n . This Markov chain should be familiar; in fact,
uniformly
HOT1 WARM3
it represents a bigram language model, with each edge expressing the probability
More formally, consider a sequence of state variables
p(wi |w j )! Given the two models in Fig. A.1, we can assign a probability to any
q 1 , q 2 , ..., q i . A
Markov .6
assumption model
sequenceembodies the Markov assumption on the probabilities of this seque
.6 vocabulary.
from our .3 .6 .1
Formally, a Markov chain is specified by the following components:
when predicting the future, the
(a) past doesn’t matter, only the present.
(b)
Q = q1 q2 . . . qN a set of N states
A = a11 a12 . .Figure A.1
. an1 . . . ann aA Markov
transition chain for
probability weather
matrix (a)
A, each aand one
i j represent-for words (b), show
Markov Assumption: P(q i = a|q 1 ...q i 1 ) = P(q i = a|q i 1 )
transitions. Aing the
start
Pn probability of
distribution p
moving
is from state
required; i to
setting p
state j,
= s.t.
[0.1, 0.7, 0.2] for (a)
j=1 ai j = 1 8i
probability 0.7 of starting in state 2 (cold), probability 0.1 of starting in state 1
Figure
p = p1A.1a
, p2 , ..., pshows
N a anMarkov chaindistribution
initial probability for assigningover states.a pprobability
i is the to a sequ
weather events,More probability
for which the that the Markov
vocabulary chain will
consists start
of in state
HOT i.
, COLD , and WAR
formally,
Some statesconsider
j may havePpaj =
sequence
0, meaning ofthatstate variables q1 , q2 , ..., qi
they cannot
states are
Markov
assumption
represented as nodes in the
model embodies the Markov assumption
be initial states. Also, graph,
n
i=1 p i = and
1 the transitions, with their
on the probabilities of this se p
(H and C) for
used correspond
HMMs doesn’t to hot and
rely cold
on a weather, andstate,
start or end the observations (drawn the
instead representing from the
distri-
alphabet O over
bution 2, 3})and
= {1,initial correspond
accepting to the explicitly.
states number ofWe icedon’t
creams
useeaten
the p by Jasoninonthis
notation a
used for HMMs doesn’t rely on a start or end 1state, instead representing the distri-
giventextbook,
day. but you may seeHMM it in theAssumptions
literature :
bution over initial and accepting states explicitly. We don’t use the p notation in this
p = p 1 , p 2 , ..., p an initial
textbook, but you may see it in the literature :
N probability
1 distribution over states. p i is the
.6
probability .5 chain will start in state i. Some
that the Markov
p = p1 , p2 , ..., pN an initial probability
states j may have p distribution
= 0, meaning over
that states.
they cannot p i
be is the
initial
P j
probability that .4 n Markov chain will start in state i. Some
the
states. Also, i=1 pi = 1
states j may
HOT 1 have p j = 0, meaning
COLD2 that
QA = {qx , qy ...} a set QA ⇢PQn of legal accepting states
they cannot be initial
states. Also, i=1.5pi = 1
B1 B2
QA =
AP(1{q x , qy ...} hidden
first-order a set QA ⇢ Q model
Markov acceptingtwo
of legalinstantiates states
simplifying assumptions.
| HOT) .2 P(1 | COLD) .5
First,P(2as| HOT)
with a= first-order
.4 Markov chain, the probability of a particular
P(2 | COLD) state
= .4 depends
P(3 | HOT) .4 P(3 | COLD) .1
Aonly on the previous
first-order hidden state:
Markov model instantiates two simplifying assumptions.
π = [.8,.2]
First, as with a first-order Markov chain, the probability of a particular state depends
only on the previous Markov
state: Assumption: P(qi |q1 ...qi 1 ) = P(qi |qi 1 ) (9.6)
Figure A.2 A hidden Markov model for relating numbers of ice creams eaten by Jason (the
Second,
observations) to thethe probability
weather (H or of
C,an
theoutput
hiddenobservation
variables). oi depends only on the state that
produced theMarkov
observation qi and not P(q
Assumption: on any
i |q1other 1 ) = P(q
...qi states or any
i |qi other
1 ) observations:
(9.6)
An influential
Second, tutorial
theIndependence:by Rabiner
probability of an (1989), based on tutorials by Jack Ferguson in
Output P(ooutput
i |q1 . . .observation
qi , . . . , qT , o1 ,o.i.depends
. , oi , . . . , oonly
T ) =on the
P(o i |qstate
i) that
(9.7)
the 1960s, introduced the idea that hidden Markov models
produced the observation qi and not on any other states or any other observations: should be characterized
by threeFigure 9.3 shows
fundamental a sample HMM for the ice cream task. The two hidden states
problems:
Hidden A hidden Markov model (HMM) allows us to talk about both observed events
Markov model
(like words that we see in the input) and hidden events (like part-of-speech tags) that
we think of as causalHidden
factors inMarkov Modelmodel. An HMM is specified by
our probabilistic
the following components:
Q = q1 q2 . . . qN a set of N states
A = a11 . . . ai j . . . aNN a transition probability matrix A, each ai j representing the probability
PN
of moving from state i to state j, s.t. j=1 ai j = 1 8i
O = o1 o2 . . . oT a sequence of T observations, each one drawn from a vocabulary V =
v1 , v2 , ..., vV
B = bi (ot ) a sequence of observation likelihoods, also called emission probabili-
ties, each expressing the probability of an observation ot being generated
from a state i
p = p1 , p2 , ..., pN an initial probability distribution over states. pi is the probability that
the Markov chain will start in state i. Some states
Pn j may have p j = 0,
meaning that they cannot be initial states. Also, i=1 pi = 1
Problem 3 (Learning):
have zero probability. Given an observation sequence O and the set of states
in the HMM, learn the HMM parameters A and B.
Now that we have seen the structure of an HMM, we turn to algorithms for
Wecomputing
already things Three
saw anwith
example problems
them. of
An for HMMs
influential2 tutorial
Problem in by Rabiner
Chapter 10. In (1989),
the nextbased
threeonsec-
tutorials by Jack Ferguson in the 1960s, introduced the idea that hidden Markov
tions we introduce all three problems more formally.
models should be characterized by three fundamental problems:

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-


Likelihood Computation: The Forward Algorithm
quence O, determine the likelihood P(O|l ).
Problem 2 (Decoding): Given an observation sequence O and an HMM l =
(A, B), discover the best hidden state sequence Q.
Problem
Our first 3 (Learning):
problem is to compute Given an observation
the likelihood of a sequence
particularOobservation
and the set of sequence.
states
in the HMM, learn the HMM parameters A and
For example, given the HMM in Fig. 9.3, what is the probability of the sequence 3 B.
1 3? More formally:
We already saw an example of Problem 2 in Chapter 10. In the next three sec-
tions we introduce all three problems more formally.
Computing Likelihood: Given an HMM l = (A, B) and an observa-
tion sequence O, determine the likelihood P(O|l ).
9.3 Likelihood
For a MarkovComputation: The
chain, where the surface Forward
observations Algorithm
are the same as the hidden
events, we could compute the probability of 3 1 3 just by following the states labeled
3 1 3 Our
andfirst
multiplying the probabilities
problem is to compute the
“A tutorial on
along
likelihood
hidden
the
Markov modelsof
arcs.
andaselected
For
particular a hidden
observation
applications
Markovsequence.
in speech recognition”,
model,
Rabiner, 1989
g in state j after seeing the first t observations, given the automaton l . The value
this probability at ( j) by summing over the extensions of all the path
each cell at ( j) is computed by summing over the probabilities of every path that
the current cell. For a given state q j at time t, the value at ( j) is compu
uld lead us to this cell. Formally, each cellForwardexpresses the Algorithm
following probability:
XN
Forward
6 A PPENDIX A a•t ( j) H= P(o1M
IDDEN , oARKOV Mt ODELS
2 . . . ot , q = j|l ) at ( j) = (9.13)
at 1 (i)ai j b j (ot )
Probability
i=1
Here, qt = j means “the tth state in the sequence of states is state j”. We compute
s probability at ( j) by summing over the extensions
α1(2)=.32 of all the paths
α2(2)= that lead
+ .02*.1 =to
The three factors that are multiplied in Eq. 9.14 in extending the p
.32*.12 .0404

e current cell. For a given state q j at time


to t,compute
theP(H|H)
valuethe
* atforward
( j) is computed
P(1|H) probabilityas at time t are
q 2
H H P(C .6 * .2
H H
|H)
*P
N .4 * (1|C
X a (i)
.5 t )1 the previous forward path probability from the previ
at ( j) = at 1 (i)ai j b j (ot ) (9.14)
H)

a i|Hj
) the transition probability
α2(1) = .32*.2 from
+ .02*.25 = .069 previous state qi to cu
* . (3 |

α (1) =i=1
.02 P ( 1
*
.8 *P

1 |C )
.2
b the state observation likelihood of the observation sy
rt)
4

H
(o )P ( *
.5 P(C|C) j t * P(1|C)
sta

The three factors


q1 C that are multipliedC in Eq. 9.14 in extending
the the
C previous paths
current state j C
H|

) .5 * .5
P(

C
compute the forward probability
* P ( 3 |
at time t are
r t) .1
s ta *
C | .2
at 1 (i) the previous
P (
forward
3 path probability from the 1previous time step 3
ai j
π
the transition probability from previous state q
o1 i to
o2
current state q j o3
b j (ot ) the state observation likelihood of the observation symbol ot given
t
the current state j
PTER 9 • H IDDEN M ARKOV M ODELS
Visualizing the forward recursion
αt-2(N) αt-1(N)

qN qN qN
aNj αt(j)= Σi αt-1(i) aij bj(ot)

qj

αt-2(3) αt-1(3) a3j


q3 q3 q3
a2j
αt-2(2) αt-1(2)
bj(ot)
q2 q2 a1j q2 q2

αt-2(1) αt-1(1)

q1 q1 q1 q1

ot-2 ot-1 ot ot+1


s=1
return forwardprob

Figure A.7
Forward Algorithm
The forward algorithm, where forward[s,t] represents a (s). t

1. Initialization:

a1 ( j) = p j b j (o1 ) 1  j  N

2. Recursion:
N
X
at ( j) = at 1 (i)ai j b j (ot ); 1  j  N, 1 < t  T
i=1

3. Termination:
N
X
P(O|l ) = aT (i)
i=1
have zero probability.
Figure 9.9 The forward algorithm. We’ve used the notation forward[s,t] to represent
at (s). Now that we have seen the structure of an HMM, we turn to algorithms for
computing things withThree problems
them. An for HMMs
influential tutorial by Rabiner (1989), based on
tutorials by Jack Ferguson in the 1960s, introduced the idea that hidden Markov
Decoding: The Viterbi Algorithm
models should be characterized by three fundamental problems:

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-


quence O, determine the likelihood P(O|l ).
For any model, such as an HMM, that contains hidden variables, the task of deter-
Problem 2 (Decoding): Given an observation sequence O and an HMM l =
mining which sequence of variables is the underlying source of some
(A, B), discover the best hidden state sequence Q.
sequence of
ecoding observations is 3called
Problem the decoding
(Learning): Giventask. In the ice-cream
an observation sequence O domain, given
and the set a sequence
of states
Decoder of ice-cream observations 3 1 3inand
the an
HMM, HMM,learnthe
the task the decoder
HMMofparameters A andisB.to find the
best hidden weather sequence (H H H). More formally,
We already saw an example of Problem 2 in Chapter 10. In the next three sec-
tions we introduce all three problems more formally.
Decoding: Given as input an HMM l = (A, B) and a sequence of ob-
servations O = o1 , o2 , ..., oT , find the most probable sequence of states
9.3 Q = q1 q 2 q3 . . . qT .
Likelihood Computation: The Forward Algorithm
Our first problem is to compute the likelihood of a particular observation sequence.
because there are an exponentially large number of state sequences.
robable path that could leadInstead, us to this cell.
cell. Formally,
For a each
given cell
state qexpresses
j at
the most common decoding algorithms for HMMs time the
t, the value
is the vViterbi
t ( j) isalgo-
computed as
ility Viterbi
algorithm Viterbi Trellis
rithm. Like the forward algorithm, Viterbi is a kind of dynamic programming
that makes uses of a dynamic programming trellis. Viterbi also strongly resembles
N
vt ( j) = maxanother dynamic
P(q1 ...qt 1 , o1programming variant,
, o2 . . . ot , qt = vt ( (A.13)
j|l ) the minimum j) =
editmax vt 1 (i)
distance ai j b j (o
algorithm oft )
q1 ,...,qt Chapter
1 2. i=1

e thatViterbi Path
we represent the most probable path The three factors
by taking that areover
the maximum multiplied
all in Eq. 9.19 for extending th
Probability
e previous state sequences max v1.(2)=.32
Liketo other dynamic
compute programming
the Viterbi algo-at time
vprobability
2(2)= max(.32*.12,
.02*.10)t =are
.038
q1 ,...,qt 1
Viterbi fillsq2each cell recursively. GivenH that we had
P(H|H) already
* P(1|H) computed
H the H
previous Viterbi path probability
H
|v the from the pre
P(C
Ht) 1 (i) .6 * .2
ility of being in every state at time t 1, we.4compute *P
(1|C the Viterbi probability
ng the most probable of the extensions of the
H) ai j paths that
* .5 )
transition
thelead probability from previous state qi to
to the current
or a given state q j at time t, the value v ( j) is computed
b) *jP(ot ) |H )
as
the state observation likelihood of the observation
v2(1) = max(.32*.20, .02*.25) = .064
* . (3|

v1(1) =t.02 ( 1
.8 *P

|C .2
the current state j
rt)
4

P ( H *
.5 P(C|C) * P(1|C)
sta

q1 C N C C C
H|

vt ( j) = max C vt 1 (i) ai j b j (ot )


) .5 * .5
(A.14)
P(

3 |
*
i=1
P (
r t) .1
s ta *
| .2
ee factors that are multiplied
P in Eq. A.14 for extending the previous paths to
( C

e the Viterbi probability


π at time t are 3 1 3
function V ITERBI(observations of len T, state-graph of len N) returns
o1 o o
create a path probability matrix viterbi[N+2,T] 3
2
1 (i) the previous Viterbi path probability from the previous time step
for each state s from 1 to
t N do ; initialization step
the transition probability from previousviterbi[s,1]
state qi to current state q
a0,s ⇤ bs (o1 ) j
Figure A.8 The Viterbi trellis for computing the best path through the hidden state space for the ice-cream
we keep a backpointer (shown with broken lines) to the best path that led us to this state.

Viterbi
Finally, we can give a formal recursion
definition of the Viterbi recursion as follows:

1. Initialization:

v1 ( j) = p j b j (o1 ) 1 jN
bt1 ( j) = 0 1 jN

2. Recursion
N
vt ( j) = max vt 1 (i) ai j b j (ot ); 1  j  N, 1 < t  T
i=1
N
btt ( j) = argmax vt 1 (i) ai j b j (ot ); 1  j  N, 1 < t  T
i=1

3. Termination:
N
The best score: P⇤ = max vT (i)
i=1
N
The start of backtrace: qT ⇤ = argmax vT (i)
i=1
also the most likely state sequence. We compute this best state sequence by keeping

Viterbi
Viterbi backtrace
track of the path of hidden states that led to each state, as suggested in Fig. A.10, and
backtrace then at the end backtracing the best path to the beginning (the Viterbi backtrace).

v1(2)=.32 v2(2)= max(.32*.12, .02*.10) = .038

P(H|H) * P(1|H)
q2 H H P(C .6 * .2 H H
|H)
*P
.4 * (1|C
.5 )
* .4 (3|H
) |H )
(1P v2(1) = max(.32*.20, .02*.25) = .064
) *
v1(1) = .02 |C .2
P

(H *
rt)*

P .5
sta

q1 P(C|C) * P(1|C)
C C C
.8

C
H|

) .5 * .5
P(

C
( 3|
* P
r )
t .1
s ta *
C | .2
P(
3 1 3
π
o1 o2 o3

t
Figure A.10 The Viterbi backtrace. As we extend each path to a new state account for the next observation,
Gaussian Observation Model
• So far, we considered HMMs with discrete outputs
• In acoustic models, HMMs output real valued vectors
• Hence, observation probabilities are defined using probability density functions
• A widely used model: Gaussian distribution
2 1 1
(x µ) 2
N (x|µ, )= p e 2 2
2⇡ 2

2
• HMM emission/observation probabilities bj(x) = 𝒩(x | μj, σj ) where μj is
2
the mean associated with state j and σj is its variance

• For multivariate Gaussians, bj(x) = 𝒩(x | μj, Σj) where Σj is the


covariance matrix associated with state j
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal
Unimodal
23/01/2017 https://upload.wikimedia.org/wikipedia/commons/7/74/Normal_Distribution_PDF.svg
23/01/2017 Gnuplot

1.0

μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
0.8
μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5,
0.6
φμ,σ (x)
2

0.4

0.2
1
0.0
0.8
−5 −4 −3 −2 −1 0 1 2 3 4 5 0.6
x 0.4
0.2
0 3
2
1 3
0 2
1
-1 0
-2 -1
-2
-3 -3
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal
• More generally, we use a “mixture of Gaussians” to
model multiple modes in the data
Mixture Models
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal

• More generally, we use a “mixture of Gaussians” to


model multiple modes in the data

• Instead of bj(x) = 𝒩(x | μj, Σj) in the single Gaussian


case, bj(x) now becomes:
M
X
bj (x) = cjm N (x|µjm , ⌃jm )
<latexit sha1_base64="ZpmSZEggz1V14OmOhiizYlXlrOw=">AAACRnicbVBNS8QwEJ2uX+v6terRS3ARFGRpVdCLIHrxoqzoqrCtJc2mazRpS5KKS+2v8+LZmz/BiwdFvJrWPfg1EPJ47w0z84KEM6Vt+8mqDA2PjI5Vx2sTk1PTM/XZuVMVp5LQNol5LM8DrChnEW1rpjk9TyTFIuD0LLjeK/SzGyoVi6MT3U+oJ3AvYiEjWBvKr3uBf7XsCqwvgzC7zVfQNnJVKvxMbDv5xQEifnYlclQ6CObZYf7NfecGMe+qvjCfK9LSuuoes57AJV7x6w27aZeF/gJnABowqJZff3S7MUkFjTThWKmOYyfay7DUjHCa19xU0QSTa9yjHQMjLKjysjKGHC0ZpovCWJoXaVSy3zsyLFSxrHEWJ6jfWkH+p3VSHW55GYuSVNOIfA0KU450jIpMUZdJSjTvG4CJZGZXRC6xxESb5GsmBOf3yX/B6VrTWW+uHW00dnYHcVRhARZhGRzYhB3Yhxa0gcA9PMMrvFkP1ov1bn18WSvWoGceflQFPgGPU7MV</latexit>
m=1
where cjm is the mixing probability for Gaussian component m of state j
XM
cjm = 1, cjm 0
m=1

You might also like