0% found this document useful (0 votes)
164 views23 pages

17 19 HMMs

Uploaded by

SAMINA ATTARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views23 pages

17 19 HMMs

Uploaded by

SAMINA ATTARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

HiddenMarkovModels

Instructor:JessicaWuͲͲ HarveyMudd College

RobotImageCredit:Viktoriya Sukhanova ©123RF.com BasedontutorialbyLawrenceRabiner

BasicProblemsforHMMs
UsethecompactnotationM = (A, B, Q).
singlepath allpaths
scoringO,allpaths
scoringO,onepathq
evaluation

P(O) = 4q P(q, O | M)
scoring/

P(q, O | M)
probabilityofobservations
probabilityofapath
overallpaths
& observations
[forwardͲbackwardalgorithm]
pathcontainingmostlikelystate
decoding

mostlikelypathq*
q = argmaxq P(q, O | M)
* atanytime point
q^ = {qt | qt = argmaxSi P(qt = Si | O, M)}
[Viterbi decoding]
[posteriordecoding]
supervisedlearningofM*
M* = argmaxM P(q, O | M)
learning

unsupervisedlearning
unsupervisedlearningofM* M* = argmaxM 4q P(q, O | M)
M* = argmaxM maxq P(q, O | M) [BaumͲWelchtraining]
[Viterbitraining]

BasedonslidesbyManolis Kellis
HMMElements
states S = {S1, …, SN} statesqt ‰ S

observations V = {v1, …, vM} observationsOt ‰ V

initialstate Q = {Qi} 1bibN


distribution Qi = P(q1 = Si)
statetransition A = {aij} 1 b i, j b N
probabilitydistribution aij = P(qt = Sj | qt–1 = Si)
observationsymbol B = {bj(k)} 1 b j b N, 1 b k b M
probabilitydistribution bj(k) = P(vk at t | qt = Sj}
ForwardͲBackwardAlgorithm(Scoring)
forward Bt(i) = P(O1 O2 … Ot, qt = Si | M) probabilityofpartialobservations
variable O1O2…Ot (untiltimet)andstateSi at
timet,givenmodelM
backward Ct(i) = P(Ot+1 Ot+2 … OT | qt = Si, M) probabilityofpartialobservations
variable Ot+1…OT givenstateSi attimet and
modelM
PosteriorDecodingAlgorithm
Ht(i) = P(qt = Si | O, M) probabilityofbeinginstateSi attimet,
Ht(i) = Bt(i)Ct(i) / 4i Bt(i)Ct(i) givenobservationsO andmodelM
ViterbiAlgorithm(Decoding)
Et(i) = maxq1…qt–1 P(q1 … qt = Si, O1 ... Ot | M) bestscore(highestprobability)along
singlepath,attimet,whichaccountsfor
firstt observationsandendsinstateSi

MarkovModelsandMarkovChains
LearningGoals

‡ DescribethepropertiesofaMarkovchain
‡ DescribetherelationshipbetweenaMarkov
ChainandaMarkovModel
‡ DescribetheelementsofaMarkovChain
MarkovChainsandMarkovModels
AMarkovchain isastochasticprocess withtheMarkovproperty.

Stochasticprocess
‡ probabilisticcounterparttoadeterministicprocess
‡ a collectionofr.v.’s thatevolveovertime

Markovproperty
‡ memoryless:conditionalprobabilitydistributionoffuturestates
dependsonlyonpresentstate

systemstateis
fullyobservable partiallyobservable
autonomous Markovchain HiddenMarkovModels
systemis partiallyobservable
controlled Markov decisionprocess
Markovdecisionprocess

MarkovChains
WecanmodelaMarkovchainasatriplet(S, Q, A),where
‡ S:finitesetofN = |S| states Whatpropertiesmust
Q andA satisfy?
‡ Q:initialstateprobabilities{Qi}

‡ A:statetransitionprobabilities{aij}

AMCoutputsan(observable)stateateach(discrete)timestep,t = 1,…,T.
q1 q2 … qT

TheprobabilityofobservationsequenceO = {O1,…,OT},whereOt ‰ S is
P(O | Model) = P(q1,…,qT)
= P(q1) P(q2 | q1) P(q3 | q1,q2) … P(qt | q1,…,qt-1) … P(qT | q1,…,qT-1)
= P(q1) P(q2 | q1) … P(qt | qt-1) … P(qT | qT-1)
MarkovModelofWeather
Onceaday(e.g.atnoon),theweatherisobservedasoneof
state1:rainystate2:cloudystate3:sunny
Thestatetransitionprobabilitiesare 0.4

R
0.2 0.3
0.3 0.1
0.2
C S
(Noticethateachrowsumsto1.) 0.6 0.1 0.8

Questions:
1.Giventhattheweatheronday1(t = 1)issunny(state3),whatis
theprobabilitythattheweatherforthenext7dayswillbe“sunͲsunͲ
rainͲrainͲsunͲcloudyͲsun”?
2.Giventhatthemodelisstatei,whatistheprobabilitythatitstaysin
statei forexactlyd days?Whatistheexpecteddurationinstatei
(alsoconditionedonstartinginstatei)?

(Thisslideintentionallyleftblank.)
SolutiontoQ1
O = {S3, S3, S3, S1, S1, S3, S2, S3}

P(O | Model)
= P(S3, S3, S3, S1, S1, S3, S2, S3 | Model)
= P(S3) P(S3|S3) P(S3|S3) P(S1|S3)
P(S1|S1) P(S3|S1) P(S2| S3) P(S3|S2)
= Q3 · a33 · a33 · a31 · a11 · a13 · a32 · a23
= (1)(0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2)
= 1.536 × 10-4

SolutiontoQ2
O = {Si, Si, Si, …, Si, Sj v Si}
123d d+1

P(O | Model, q1 = Si) = (aii)d–1(1 – aii) = pi(d)


wherepi(d) isthe(discrete)PDFofdurationd instatei.
NoticethatDi ~ geometric(p),wherep = 1 – aii isthe
probabilityofsuccess(exitingstatei)andthereared – 1 failures
beforethefirstsuccess.
the“math”way:X ~ geom(p)

Then forx ‰ , |x| b 1,

Intuition:Considerafairdie.Ifthe
probabilityofsuccess(a“1”)isp = 1/6,
itwilltake1/p = 6 rollsuntilasuccess.

Forexample,theexpectednumberofconsecutivedaysofrainy
weatheris1/a11 = 1/0.6 = 1.67;forcloudy,2.5;forsunny,5.
S
HiddenMarkovModels
LearningGoals

‡ DescribethedifferencebetweenMarkov
ChainsandHiddenMarkovModels
‡ DescribeapplicationsofHMMs
‡ DescribetheelementsofaHMM
‡ DescribethebasicproblemsforHMMs

HiddenMarkovModels
Nowwewouldliketomodelpairsofsequences.
‡ Thereexistsanunderlyingstochasticprocessthatishidden
(notobservabledirectly).
‡ Butitaffectsobservations(thatwecancollectdirectly).
somebooksuse
states q1 q2 … qT yi’s (label)

observations O1 O2 … OT xi’s(feature)
HMMsareEverywhere
application states observations
weatherinference seasons
dishonestcasino diceused
(casinohasfairdieand
loadeddie,casinoswitches
betweendiceonaverage
onceevery20turns)
missiletracking position
speechrecognition phoneme
NLPpartͲofͲspeechtagging partofspeech
computationalbiology protein structure
medicine disease (stateofprogression)

ElementsofanHMM
A5Ͳtuple(S, V, Q, A, B),where
‡ S:finitesetofstates{S1, …, SN}
‡ V:finitesetofobservationsperstate{v1, …, vM}
‡ Q:initialstatedistribution{Qi} Notethat...

‡ A:statetransitionprobabilitydistribution{aij} transitions...

‡ B:observationsymbolprobabilitydistribution{bj(k)} andemissions…

bj(k) = P(vk at t | qt = Sj}, 1 b j b N dependonly


oncurrentstate.
1bkbM

AHMMoutputsonlyemittedsymbolsO = {O1,…,OT},whereOt ‰ V.
Boththeunderlyingstatesandrandomwalkbetweenstatesarehidden.
HMMsasaGenerativeModel
GivenS,V,Q,A,B,theHMMcanbeusedasageneratorto
giveanobservationsequence
O = O1O2…OT.

1) Chooseinitialstateq1 = Si accordingtoinitialstate
distributionQ.
2) Sett = 1.
3) ChooseOt = vk accordingtosymbolprobability
distributioninstateSi,i.e.,bi(k).
4) Transittonewstateqt+1 = Sj accordingtostate
transitionprobabilitydistributionforstateSi,i.e.,aij.
5) Sett = t + 1.Returntostep3ift < T.Otherwisestop.

ScoringHMMs
LearningGoals

‡ Describehowtoscoreanobservationovera
singlepathandovermultiplepaths
‡ Describetheforwardalgorithm
ScoringaSequenceoveraSinglePath
states q1 q2 … qT

observations O1 O2 … OT

CalculateP(q, O | M).

ScoringaSequenceoverAllPaths
CalculateP(O | M).

Naïve(bruteͲforce)approach
P(O | M) = 4q P(q, O | M)

Howmanycalculationsarerequired(bigͲO)?_____
TheForwardAlgorithm
Definetheforwardvariable as
Bt(i) = P(O1 O2 … Ot, qt = Si | M)
i.e.theprobabilityofthepartialobservationsequence
O1O2…Ot (untiltimet)andstateSi attimet,giventhemodelM.

Useinduction!AssumeweknowBt(i) for1 b i b N.
S1
a1j
S2 a2j

Sj

aNj updatedsum
# sumending transition emissionof
instateSi fromstateSi observationOt+1
attimet tostateSj fromstateSj
SN
attimet tot+1 attimet+1
t t+1
sumoverallpossiblepreviousstatesSi
Bt(i) Bt+1(j)

TheForwardAlgorithm
1) Initialization

2) Induction

Performforallstatesforgivent,
3) Termination thenadvancet.

[ProofsforInitializationandTerminationSteps]
DynamicProgrammingTable
1

state 2

# Bt(i)

1 2 3 ! t ! ! T

observation

TheForwardVariable
Weshowedtheinductionstepfor Bt+1(j) throughintuition.Canweproveit?
TheForwardAlgorithm
Whatisthecomplexity oftheforwardalgorithm?
‡ timecomplexity:_____
± comparetobruteͲforceO(NT·T)
± e.g.N = 5,T = 100,need~3k computationsvs1072
‡ spacecomplexity:_____

PracticalIssues
‡ underflowo uselogprobabilitiesformodel
‡ forsumsofprobabilities,uselogͲsumͲexp trick

DecodingHMMs
LearningGoals

‡ Describehowtodecodethestatesequence
‡ DescribetheViterbialgorithm
PosteriorDecoding
Wewanttocompute
qt = argmaxSi P(qt = Si | O, M)
Define Ht(i) = P(qt = Si | O, M),i.e.theprobabilityofbeingin
stateSi attimet,givenobservationsequenceO andmodelM.

Then Westillneedtodetermine
P(O, qt = Si | M).

WejustdeterminedP(O | M)
usingtheforwardalgorithm.

ProbabilitiesforPosteriorDecoding
P(O, qt = Si | M) = P(O1…Ot, qt = Si | M) P(Ot+1…OT | qt = Si, M)
Bt(i) Ct(i)

q1 qt = Si qT
… …

… …
O1 Ot Ot+1 OT
TheBackwardAlgorithm
(analternativeapproach,usefullatertoo)
Definethebackwardvariable as
Ct(i) = P(Ot+1 Ot+2 … OT | qt = Si, M)
i.e.theprobabilityofthepartialobservationsequenceOt+1…OT
givenstateSi attimet andthemodelM.
[Notethatthestateattimet isnowgivenandonRHSofconditional.]

S1
1) Initialization(arbitrarilydefineCT(i) tobe1foralli) ai1
ai2 S2

2) Induction Si

aiN #

SN

t t+1
Ct(i) Ct+1(j)

DynamicProgrammingTable
1

2
state

# Ct(i)

1 2 3 ! t ! ! T

observation
PosteriorDecoding
Then

2
state
# Ht(i)

N
1 2 3 ! t ! ! T
observation

Nowsolve

PosteriorDecoding
Wefoundtheindividuallymostlikelystateqt attimet.

TheGood
‡ maximizesexpectednumberofcorrectstates

TheBad
‡ mayresultininvalidpath
(notallSi o Sj transitionsmaybepossible)

Ÿ mostprobablestateismostlikelytobecorrectat
anyinstant,butsequenceofindividuallyprobable
statesisnot likelytobemostprobablesequence
ViterbiDecoding
Goal:Findsinglebeststatesequence.
q* = argmaxq P(q | O, M) = argmaxq P(q, O | M)

Define

i.e.thebestscore(highestprobability)alongasinglepath,at
timet,whichaccountsforthefirstt observationsandendsin
stateSi.

ComparetoalgorithmforBt(i) = P(O1 … Ot, qt = Si | M).


Todeterminebestpathtoqt+1 = Sj,computeEt+1(j).
‡ bestpathtoqt = Si Ÿ Et(i)
‡ transitionfromqt = Si toqt+1 = Sj Ÿ aij
‡ emissionofOt+1 fromqt+1 = Sj Ÿ bj(Ot+1)
‡ toretrievestatesequence,alsoneedtraceback pointer
\t(i) = stateSi thatmaximizesEt(i)

TheViterbiAlgorithm
1) Initialization

2) Induction

3) Termination

4) Path(statesequence)backtracking
TheViterbiAlgorithm
1) Initialization

2) Induction

3) Termination Performforallstatesforgivent,
thenadvancet.

4) Path(statesequence)backtracking

TheViterbiAlgorithm
‡ similartoforwardalgorithm(usemaxinsteadofsum)
‡ useDPtabletocompute
‡ samecomplexityasforwardalgorithm

PracticalIssues
‡ underflowissueso uselogprobabilitiesformodel
‡ forlogsofproductsofprobabilities,usesumoflogs
LearningHMMs
LearningGoals

‡ DescribehowtolearnHMMparameters
‡ DescribetheBaumͲWelchalgorithm

Learning
Goal
AdjustthemodelparametersM = (A, B, Q) to
maximizeP(O | M),i.e.theprobabilityofthe
observationsequence(s)giventhemodel.

SupervisedApproach
Assumewehavecompletedata(weknowthe
underlyingstates).UseMLE.
SupervisedLearningExample
statespace S = {1, 2}
observationspace V = {e, f, g, h}
trainingset 12 12 12 12
eg eh f h f g

Whataretheoptimalmodelparameters?

Pseudocounts
Forsmalltrainingset,theparametersmayoverfit.
‡ P(O | M) ismaximizedbutM isunreasonable
‡ probabilitiesof0areproblematic

Addpseudocounts torepresentourpriorbelief.
‡ largepseudocounts o largeregularization
‡ smallpseudocounts o smallregularization
(justtoavoidP = 0)
Learning
UnsupervisedApproach
‡ wedonotknowtheunderlyingstates
‡ noknownwaytoanalyticallysolveforoptimalmodel

Ideas
‡ useiterativealgorithmtolocallymaximizeP(O | M)
‡ eithergradientdescentorEMwork
‡ BaumͲWelchalgorithmbasedonEMismostpopular

UnsupervisedLearning
Goal

RecallHt(i) = P(qt = Si | O, M),i.e.theprobabilityofbeingin


stateSi attimet,givenobservationsequenceO andmodelM.
Canweusethistosolveforanyoftheaboveterms?
ExpectedNumberofTransitions
Define
Yt(i, j) = P(qt = Si, qt+1 = Sj | O, M)
i.e.theprobabilityofbeinginstateSi attimet,andstateSj at
timet + 1,giventhemodelandtheobservationsequence.
Tocalculatethenumerator,
Bt(Si) aij bj(Ot+1) Ct+1(Sj)
WealreadyknowP(O | M)
usingtheforwardalgorithm.
q1 qt = Si qt+1 = Sj qT
… …
aij

… …
O1 Ot Ot+1 OT
Bt(Si) bj(Ot+1) Ct+1(Sj)

UnsupervisedLearning
Goal

S
BaumͲWelchAlgorithm
Initialization
‡ SetM = (A, B, Q) torandominitialconditions(or
usingpriorinformation)

Iteration(repeatuntilconvergence)
‡ ComputeBt(i) andCt(i) usingforwardͲbackwardalgo
Ÿ ComputeP(O | M) [EͲstep]
‡ ComputeHt(i) andYt(i,j)
Ÿ Updatemodelparameters[MͲstep]

BaumͲWelchAlgorithm
‡ Timecomplexity:O(N 2T) · (# iterations)
‡ GuaranteedtoincreaselikelihoodP(O | M) viaEM
butnot guaranteedtofindgloballyoptimalM*

PracticalIssues
‡ Usemultipletrainingsequences(sumoverthem)
‡ Applysmoothingtoavoidzerocountsandimprove
generalization(addpseudocounts)
HMMsandProteinStructure
OnebiologicalapplicationofHMMsistodeterminethesecondarystructure
(i.e.thegeneralthreeͲdimensionalshape)ofaprotein.Thisgeneralshapeis
madeupofalphahelices,betasheets,andotherstructures.Inthisproblem,
wewillassumethattheaminoacidcompositionoftheseregionsisgoverned
byanHMM.

Tokeepthisproblemrelativelysimple,wedonotuseactualtransitionvalues
oremissionprobabilities.Thestartstateisalways“other”.Wewillusethe
statetransitionprobabilitiesandemissionprobabilitiesbelow.
aminoacid alpha beta other
alpha beta other M 0.35 0.10 0.05
alpha 0.7 0.1 0.2 L 0.30 0.05 0.15
beta 0.2 0.6 0.2 N 0.15 0.30 0.20
other 0.3 0.3 0.4 E 0.10 0.40 0.15
e.g.P(AlphaHelixl BetaSheet)=0.1 A 0.05 0.00 0.20
G 0.05 0.15 0.25
BasedonexercisebyManolis Kellis

ProteinStructureQuestions
1) WhatistheprobabilityP(q = OD, O = ML)?

2) HowmanypathscouldgiverisetothesequenceO
= MLN?WhatisthetotalprobabilityP(O)?

3) Givethemostlikelystatetransitionpathq* forthe
aminoacidsequenceMLN usingtheViterbi
algorithm.WhatisP(q*,O)?

ComparethistoP(O) above.Whatdoesthissay
aboutthereliabilityoftheViterbipath?
BasedonexercisebyManolis Kellis

You might also like