17 19 HMMs
17 19 HMMs
BasicProblemsforHMMs
UsethecompactnotationM = (A, B, Q).
singlepath allpaths
scoringO,allpaths
scoringO,onepathq
evaluation
P(O) = 4q P(q, O | M)
scoring/
P(q, O | M)
probabilityofobservations
probabilityofapath
overallpaths
& observations
[forwardͲbackwardalgorithm]
pathcontainingmostlikelystate
decoding
mostlikelypathq*
q = argmaxq P(q, O | M)
* atanytime point
q^ = {qt | qt = argmaxSi P(qt = Si | O, M)}
[Viterbi decoding]
[posteriordecoding]
supervisedlearningofM*
M* = argmaxM P(q, O | M)
learning
unsupervisedlearning
unsupervisedlearningofM* M* = argmaxM 4q P(q, O | M)
M* = argmaxM maxq P(q, O | M) [BaumͲWelchtraining]
[Viterbitraining]
BasedonslidesbyManolis Kellis
HMMElements
states S = {S1, …, SN} statesqt S
MarkovModelsandMarkovChains
LearningGoals
DescribethepropertiesofaMarkovchain
DescribetherelationshipbetweenaMarkov
ChainandaMarkovModel
DescribetheelementsofaMarkovChain
MarkovChainsandMarkovModels
AMarkovchain isastochasticprocess withtheMarkovproperty.
Stochasticprocess
probabilisticcounterparttoadeterministicprocess
a collectionofr.v.’s thatevolveovertime
Markovproperty
memoryless:conditionalprobabilitydistributionoffuturestates
dependsonlyonpresentstate
systemstateis
fullyobservable partiallyobservable
autonomous Markovchain HiddenMarkovModels
systemis partiallyobservable
controlled Markov decisionprocess
Markovdecisionprocess
MarkovChains
WecanmodelaMarkovchainasatriplet(S, Q, A),where
S:finitesetofN = |S| states Whatpropertiesmust
Q andA satisfy?
Q:initialstateprobabilities{Qi}
A:statetransitionprobabilities{aij}
AMCoutputsan(observable)stateateach(discrete)timestep,t = 1,…,T.
q1 q2 … qT
TheprobabilityofobservationsequenceO = {O1,…,OT},whereOt S is
P(O | Model) = P(q1,…,qT)
= P(q1) P(q2 | q1) P(q3 | q1,q2) … P(qt | q1,…,qt-1) … P(qT | q1,…,qT-1)
= P(q1) P(q2 | q1) … P(qt | qt-1) … P(qT | qT-1)
MarkovModelofWeather
Onceaday(e.g.atnoon),theweatherisobservedasoneof
state1:rainystate2:cloudystate3:sunny
Thestatetransitionprobabilitiesare 0.4
R
0.2 0.3
0.3 0.1
0.2
C S
(Noticethateachrowsumsto1.) 0.6 0.1 0.8
Questions:
1.Giventhattheweatheronday1(t = 1)issunny(state3),whatis
theprobabilitythattheweatherforthenext7dayswillbe“sunͲsunͲ
rainͲrainͲsunͲcloudyͲsun”?
2.Giventhatthemodelisstatei,whatistheprobabilitythatitstaysin
statei forexactlyd days?Whatistheexpecteddurationinstatei
(alsoconditionedonstartinginstatei)?
(Thisslideintentionallyleftblank.)
SolutiontoQ1
O = {S3, S3, S3, S1, S1, S3, S2, S3}
P(O | Model)
= P(S3, S3, S3, S1, S1, S3, S2, S3 | Model)
= P(S3) P(S3|S3) P(S3|S3) P(S1|S3)
P(S1|S1) P(S3|S1) P(S2| S3) P(S3|S2)
= Q3 · a33 · a33 · a31 · a11 · a13 · a32 · a23
= (1)(0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2)
= 1.536 × 10-4
SolutiontoQ2
O = {Si, Si, Si, …, Si, Sj v Si}
123d d+1
Intuition:Considerafairdie.Ifthe
probabilityofsuccess(a“1”)isp = 1/6,
itwilltake1/p = 6 rollsuntilasuccess.
Forexample,theexpectednumberofconsecutivedaysofrainy
weatheris1/a11 = 1/0.6 = 1.67;forcloudy,2.5;forsunny,5.
S
HiddenMarkovModels
LearningGoals
DescribethedifferencebetweenMarkov
ChainsandHiddenMarkovModels
DescribeapplicationsofHMMs
DescribetheelementsofaHMM
DescribethebasicproblemsforHMMs
HiddenMarkovModels
Nowwewouldliketomodelpairsofsequences.
Thereexistsanunderlyingstochasticprocessthatishidden
(notobservabledirectly).
Butitaffectsobservations(thatwecancollectdirectly).
somebooksuse
states q1 q2 … qT yi’s (label)
observations O1 O2 … OT xi’s(feature)
HMMsareEverywhere
application states observations
weatherinference seasons
dishonestcasino diceused
(casinohasfairdieand
loadeddie,casinoswitches
betweendiceonaverage
onceevery20turns)
missiletracking position
speechrecognition phoneme
NLPpartͲofͲspeechtagging partofspeech
computationalbiology protein structure
medicine disease (stateofprogression)
ElementsofanHMM
A5Ͳtuple(S, V, Q, A, B),where
S:finitesetofstates{S1, …, SN}
V:finitesetofobservationsperstate{v1, …, vM}
Q:initialstatedistribution{Qi} Notethat...
A:statetransitionprobabilitydistribution{aij} transitions...
B:observationsymbolprobabilitydistribution{bj(k)} andemissions…
AHMMoutputsonlyemittedsymbolsO = {O1,…,OT},whereOt V.
Boththeunderlyingstatesandrandomwalkbetweenstatesarehidden.
HMMsasaGenerativeModel
GivenS,V,Q,A,B,theHMMcanbeusedasageneratorto
giveanobservationsequence
O = O1O2…OT.
1) Chooseinitialstateq1 = Si accordingtoinitialstate
distributionQ.
2) Sett = 1.
3) ChooseOt = vk accordingtosymbolprobability
distributioninstateSi,i.e.,bi(k).
4) Transittonewstateqt+1 = Sj accordingtostate
transitionprobabilitydistributionforstateSi,i.e.,aij.
5) Sett = t + 1.Returntostep3ift < T.Otherwisestop.
ScoringHMMs
LearningGoals
Describehowtoscoreanobservationovera
singlepathandovermultiplepaths
Describetheforwardalgorithm
ScoringaSequenceoveraSinglePath
states q1 q2 … qT
observations O1 O2 … OT
CalculateP(q, O | M).
ScoringaSequenceoverAllPaths
CalculateP(O | M).
Naïve(bruteͲforce)approach
P(O | M) = 4q P(q, O | M)
Howmanycalculationsarerequired(bigͲO)?_____
TheForwardAlgorithm
Definetheforwardvariable as
Bt(i) = P(O1 O2 … Ot, qt = Si | M)
i.e.theprobabilityofthepartialobservationsequence
O1O2…Ot (untiltimet)andstateSi attimet,giventhemodelM.
Useinduction!AssumeweknowBt(i) for1 b i b N.
S1
a1j
S2 a2j
Sj
aNj updatedsum
# sumending transition emissionof
instateSi fromstateSi observationOt+1
attimet tostateSj fromstateSj
SN
attimet tot+1 attimet+1
t t+1
sumoverallpossiblepreviousstatesSi
Bt(i) Bt+1(j)
TheForwardAlgorithm
1) Initialization
2) Induction
Performforallstatesforgivent,
3) Termination thenadvancet.
[ProofsforInitializationandTerminationSteps]
DynamicProgrammingTable
1
state 2
# Bt(i)
1 2 3 ! t ! ! T
observation
TheForwardVariable
Weshowedtheinductionstepfor Bt+1(j) throughintuition.Canweproveit?
TheForwardAlgorithm
Whatisthecomplexity oftheforwardalgorithm?
timecomplexity:_____
± comparetobruteͲforceO(NT·T)
± e.g.N = 5,T = 100,need~3k computationsvs1072
spacecomplexity:_____
PracticalIssues
underflowo uselogprobabilitiesformodel
forsumsofprobabilities,uselogͲsumͲexp trick
DecodingHMMs
LearningGoals
Describehowtodecodethestatesequence
DescribetheViterbialgorithm
PosteriorDecoding
Wewanttocompute
qt = argmaxSi P(qt = Si | O, M)
Define Ht(i) = P(qt = Si | O, M),i.e.theprobabilityofbeingin
stateSi attimet,givenobservationsequenceO andmodelM.
Then Westillneedtodetermine
P(O, qt = Si | M).
WejustdeterminedP(O | M)
usingtheforwardalgorithm.
ProbabilitiesforPosteriorDecoding
P(O, qt = Si | M) = P(O1…Ot, qt = Si | M) P(Ot+1…OT | qt = Si, M)
Bt(i) Ct(i)
q1 qt = Si qT
… …
… …
O1 Ot Ot+1 OT
TheBackwardAlgorithm
(analternativeapproach,usefullatertoo)
Definethebackwardvariable as
Ct(i) = P(Ot+1 Ot+2 … OT | qt = Si, M)
i.e.theprobabilityofthepartialobservationsequenceOt+1…OT
givenstateSi attimet andthemodelM.
[Notethatthestateattimet isnowgivenandonRHSofconditional.]
S1
1) Initialization(arbitrarilydefineCT(i) tobe1foralli) ai1
ai2 S2
2) Induction Si
aiN #
SN
t t+1
Ct(i) Ct+1(j)
DynamicProgrammingTable
1
2
state
# Ct(i)
1 2 3 ! t ! ! T
observation
PosteriorDecoding
Then
2
state
# Ht(i)
N
1 2 3 ! t ! ! T
observation
Nowsolve
PosteriorDecoding
Wefoundtheindividuallymostlikelystateqt attimet.
TheGood
maximizesexpectednumberofcorrectstates
TheBad
mayresultininvalidpath
(notallSi o Sj transitionsmaybepossible)
mostprobablestateismostlikelytobecorrectat
anyinstant,butsequenceofindividuallyprobable
statesisnot likelytobemostprobablesequence
ViterbiDecoding
Goal:Findsinglebeststatesequence.
q* = argmaxq P(q | O, M) = argmaxq P(q, O | M)
Define
i.e.thebestscore(highestprobability)alongasinglepath,at
timet,whichaccountsforthefirstt observationsandendsin
stateSi.
TheViterbiAlgorithm
1) Initialization
2) Induction
3) Termination
4) Path(statesequence)backtracking
TheViterbiAlgorithm
1) Initialization
2) Induction
3) Termination Performforallstatesforgivent,
thenadvancet.
4) Path(statesequence)backtracking
TheViterbiAlgorithm
similartoforwardalgorithm(usemaxinsteadofsum)
useDPtabletocompute
samecomplexityasforwardalgorithm
PracticalIssues
underflowissueso uselogprobabilitiesformodel
forlogsofproductsofprobabilities,usesumoflogs
LearningHMMs
LearningGoals
DescribehowtolearnHMMparameters
DescribetheBaumͲWelchalgorithm
Learning
Goal
AdjustthemodelparametersM = (A, B, Q) to
maximizeP(O | M),i.e.theprobabilityofthe
observationsequence(s)giventhemodel.
SupervisedApproach
Assumewehavecompletedata(weknowthe
underlyingstates).UseMLE.
SupervisedLearningExample
statespace S = {1, 2}
observationspace V = {e, f, g, h}
trainingset 12 12 12 12
eg eh f h f g
Whataretheoptimalmodelparameters?
Pseudocounts
Forsmalltrainingset,theparametersmayoverfit.
P(O | M) ismaximizedbutM isunreasonable
probabilitiesof0areproblematic
Addpseudocounts torepresentourpriorbelief.
largepseudocounts o largeregularization
smallpseudocounts o smallregularization
(justtoavoidP = 0)
Learning
UnsupervisedApproach
wedonotknowtheunderlyingstates
noknownwaytoanalyticallysolveforoptimalmodel
Ideas
useiterativealgorithmtolocallymaximizeP(O | M)
eithergradientdescentorEMwork
BaumͲWelchalgorithmbasedonEMismostpopular
UnsupervisedLearning
Goal
… …
O1 Ot Ot+1 OT
Bt(Si) bj(Ot+1) Ct+1(Sj)
UnsupervisedLearning
Goal
S
BaumͲWelchAlgorithm
Initialization
SetM = (A, B, Q) torandominitialconditions(or
usingpriorinformation)
Iteration(repeatuntilconvergence)
ComputeBt(i) andCt(i) usingforwardͲbackwardalgo
ComputeP(O | M) [EͲstep]
ComputeHt(i) andYt(i,j)
Updatemodelparameters[MͲstep]
BaumͲWelchAlgorithm
Timecomplexity:O(N 2T) · (# iterations)
GuaranteedtoincreaselikelihoodP(O | M) viaEM
butnot guaranteedtofindgloballyoptimalM*
PracticalIssues
Usemultipletrainingsequences(sumoverthem)
Applysmoothingtoavoidzerocountsandimprove
generalization(addpseudocounts)
HMMsandProteinStructure
OnebiologicalapplicationofHMMsistodeterminethesecondarystructure
(i.e.thegeneralthreeͲdimensionalshape)ofaprotein.Thisgeneralshapeis
madeupofalphahelices,betasheets,andotherstructures.Inthisproblem,
wewillassumethattheaminoacidcompositionoftheseregionsisgoverned
byanHMM.
Tokeepthisproblemrelativelysimple,wedonotuseactualtransitionvalues
oremissionprobabilities.Thestartstateisalways“other”.Wewillusethe
statetransitionprobabilitiesandemissionprobabilitiesbelow.
aminoacid alpha beta other
alpha beta other M 0.35 0.10 0.05
alpha 0.7 0.1 0.2 L 0.30 0.05 0.15
beta 0.2 0.6 0.2 N 0.15 0.30 0.20
other 0.3 0.3 0.4 E 0.10 0.40 0.15
e.g.P(AlphaHelixl BetaSheet)=0.1 A 0.05 0.00 0.20
G 0.05 0.15 0.25
BasedonexercisebyManolis Kellis
ProteinStructureQuestions
1) WhatistheprobabilityP(q = OD, O = ML)?
2) HowmanypathscouldgiverisetothesequenceO
= MLN?WhatisthetotalprobabilityP(O)?
3) Givethemostlikelystatetransitionpathq* forthe
aminoacidsequenceMLN usingtheViterbi
algorithm.WhatisP(q*,O)?
ComparethistoP(O) above.Whatdoesthissay
aboutthereliabilityoftheViterbipath?
BasedonexercisebyManolis Kellis