0% found this document useful (0 votes)
20 views53 pages

Bayesian Belief Network

bayesian network

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views53 pages

Bayesian Belief Network

bayesian network

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Bayesian Belief Network

P(a b) P(a) P(b)


P(toothache, catch, cavity, Weather cloudy )
P(Weather cloudy ) P(toothache, catch, cavity)

The decomposition of large probabilistic domains into


weakly connected subsets via conditional
independence is one of the most important
developments in the recent history of AI

This can work well, even the assumption is not true!


vNB

Naive Bayes assumption:

which gives
Bayesian networks
Conditional Independence
Inference in Bayesian Networks
Irrelevant variables
Constructing Bayesian Networks
Aprendizagem Redes Bayesianas

Examples - Exercisos
Naive Bayes assumption of conditional
independence too restrictive
But it's intractable without some such
assumptions...

Bayesian Belief networks describe conditional


independence among subsets of variables
allows combining prior knowledge about
(in)dependencies among
variables with observed training data
Bayesian networks
A simple, graphical notation for conditional independence
assertions and hence for compact specification of full joint
distributions

Syntax:
a set of nodes, one per variable

a directed, acyclic graph (link "directly influences")
a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))

In the simplest case, conditional distribution represented as a


conditional probability table (CPT) giving the distribution over Xi
for each combination of parent values
Bayesian Networks
Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability distribution
Nodes: random variables
X Y Links: dependency
X,Y are the parents of Z, and Y is the
Z parent of P
P No dependency between Z and P
Has no loops or cycles
Conditional Independence
Once we know that the patient has cavity we do
not expect the probability of the probe catching to
depend on the presence of toothache
P(catch | cavity toothache) P(catch | cavity)
P(toothache | cavity catch) P(toothache | cavity)

Independence between a and b


P ( a | b) P ( a )
P(b | a) P(b)
Example
Topology of network encodes conditional independence assertions:

Weather is independent of the other variables


Toothache and Catch are conditionally independent given Cavity
Bayesian Belief Network: An
Example
Family
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1


LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

The conditional probability table


for the variable LungCancer:
PositiveXRay Dyspnea Shows the conditional probability
for each possible combination of its
parents
Bayesian Belief Networks n
P( z1,..., zn) P ( z i | Parents( Z i ))
i 1
Example
I'm at work, neighbor John calls to say my alarm is ringing, but neighbor
Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a
burglar?

Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reflects "causal" knowledge:

A burglar can set the alarm off


An earthquake can set the alarm off
The alarm can cause Mary to call
The alarm can cause John to call
Belief Networks

P(B) Earthquake P(E)


Burglary 0.001 0.002

Burg. Earth. P(A)


t t .95
Alarm t f .94
f t .29
f f .001

A P(J) A P(M)
JohnCalls t .90 MaryCalls t .7
f .01
f .05
Full Joint Distribution
n
P( x1 ,..., xn ) P( xi | parents( X i ))
i 1

P( j m a b e)
P( j | a) P(m | a) P(a | b e) P(b) P(e)
0.9 0.7 0.001 0.999 0.998 0.00062
Compactness
A CPT for Boolean Xi with k Boolean parents has 2k rows for the
combinations of parent values

Each row requires one number p for Xi = true


(the number for Xi = false is just 1-p)

If each variable has no more than k parents, the complete network requires
O(n 2k) numbers

I.e., grows linearly with n, vs. O(2n) for the full joint distribution

For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)


Inference in Bayesian Networks
How can one infer the (probabilities of)
values of one or more network variables,
given observed values of others?
Bayes net contains all information needed
for this inference
If only one variable with unknown value,
easy to infer it
In general case, problem is NP hard
Example
In the burglary network, we migth observe
the event in which JohnCalls=true and
MarryCalls=true
We could ask for the probability that the
burglary has occured

P(Burglary|JohnCalls=ture,MarryCalls=true)
Remember - Joint distribution
Zur Anzeige wird der QuickTime
Dekompressor TIFF (LZW)
bentigt.

P(cavity toothache)
P(cavity | toothache)
P(toothache) ZuD
rA
en
oz
ke
mige
rw
peir
sd
od
s
ben
rerIF
T QF
tigt.
u(L
kTZ
ic im
We)

0.108 0.012
0.6
0.108 0.012 0.016 0.064

P(cavity toothache)
P(cavity | toothache)
P(toothache)

0.016 0.064
0.4
0.108 0.012 0.016 0.064
Normalization
1 P ( y | x ) P ( y | x )

P(Y | X ) P( X | Y ) P(Y )

P( y | x), P(y | x)

0.12,0.08 0.6,0.4
Normalization
P(Cavity | toothache) P(Cavity,toothache)
[P(Cavity,toothache,catch) P(Cavity,toothache,catch)]
[ 0.108,0.016 0.012,0.064 ] 0.12,0.08 0.6,0.4

X is the query variable


E evidence variable
Y remaining unobservable variable

P(X | e) P(X,e) P(X,e, y)


y
Summation over all possible y (all possible values of the
unobservable varables Y)
P(Burglary|JohnCalls=ture,MarryCalls=true)
The hidden variables of the query are Earthquake
and Alarm

P(B | j,m) P(B, j,m) P(B,e,a, j,m)


e a

For Burglary=true in the Bayesain network

P(b | j,m) P(b)P(e)P(a | b,e)P( j | a)P(m | a)


e a
To compute we had to add four terms,
each computed by multipling five numbers
In the worst case, where we have to sum
out almost all variables, the complexity of
the network with n Boolean variables is
O(n2n)
P(b) is constant and can be moved out, P(e)
term can be moved outside summation a
P(b | j,m) P(b) P(e) P(a | b,e)P( j | a)P(m | a)
e a

JohnCalls=true and MarryCalls=true, the probability


that the burglary has occured is aboud 28%

P(B, j,m) 0.00059224,0.0014919 0.284,0.716


Computation for Burglary=true

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Variable elimination algorithm
Eliminate repeated calculation
Dynamic programming

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Irrelevant variables
(X query variable, E evidence variables)

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Complexity of Zur Anzeige w ird der Quic kTime
Dekompres sor TIFF (LZW)
bentigt.

exact inference
The burglary network belongs to a family of
networks in which there is at most one
undiracted path between tow nodes in the
network
These are called singly connected networks or
polytrees
The time and space complexity of exact
inference in polytrees is linear in the size of
network
Size is defined by the number of CPT entries
If the number of parents of each node is bounded by
a constant, then the complexity will be also linear in
the number of nodes
For multiply connected networks variable
elimination can have exponentional time
and space complexity

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Constructing Bayesian Networks
A Bayesian network is a correct
representation of the domain only if each node
is conditionally independent of its
predecessors in the ordering, given its parents

P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)
Conditional Independence
relations in Bayesian networks

The toopological semantics is given either


of the spqcifications of DESCENDANTS
or MARKOV BLANKET
Local semantics

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Example

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.

JohnCalls is indipendent of Burglary and


Earthquake given the value of Alarm
Zur Anzeige wird der QuickTime
Dekompressor TIFF (LZW)
bentigt.
Example

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.

Burglary is indipendent of JohnCalls and


MaryCalls given Alarm and Earthquake
Constructing Bayesian
networks
1. Choose an ordering of variables X1, ,Xn

2. For i = 1 to n
add Xi to the network

select parents from X1, ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:

P (X1, ,Xn) = ni =1 P (Xi | X1, , Xi-1)


(chain rule)
The compactness of Bayesian networks is an
example of locally structured systems
Each subcomponent interacts directly with only
bounded number of other components
Constructing Bayesian networks is difficult
Each variable should be directly influenced by only a
few others
The network topology reflects thes direct influences
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Example contd.

Deciding conditional independence is hard in noncausal directions

(Causal models and conditional independence seem hardwired for humans!)

Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed


Some links represent tenous relationship that require difficult and unnatural
probability judgment, such the probability of Earthquake given Burglary and Alarm

Zur Anzeige wird der QuickTime


Dekompressor TIFF (LZW)
bentigt.
Zur Anzeige wird der QuickTime
Dekompressor TIFF (LZW)
bentigt.
Aprendizagem Redes Bayesianas

Como preencher as entradas numa Tabela de Probabilidade


Condicional
1 Caso: Se a estrutura da rede bayesiana fr conhecida, e todas as
variavis podem ser observadas do conjunto de treino.
Ento:
Entrada (i,j) = P( yi / Pr edecessore s(Yi )) utilizando os valores
observados no conjunto de treino

2 Caso: Se a estrutura da rede bayesiana fr conhecida, e algumas


das variavis no podem ser observadas no conjunto de treino.
Ento utiliza-se mtodo do algoritmo do gradiente ascendente
Family
History
Smoker Exemplo 1 caso
Person FH S E LC PXRay D
P1 Sim Sim No Sim + Sim
P2 Sim No No Sim - Sim
LungCancer Emphysema P3 Sim No Sim No + No
P4 No Sim Sim Sim - Sim
P5 No Sim No No + No

P6 Sim Sim ? ? ? ?

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S) P( yi / Pr edecessore s(Yi ))


P(LC = Sim \ FH=Sim, S=Sim) =0.5
LC 0.5
~LC
Exemplo 2 caso
Person FH S E LC PXRay D
P1 --- Sim --- Sim + Sim
P2 --- No --- Sim - Sim
P3 --- No --- No + No
P4 --- Sim --- Sim - Sim
P5 --- Sim --- No + No

P6 Sim Sim ? ? ? ?

Suppose structure known, variables partially


observable
Similar to training neural network with hidden units
In fact, can learn network conditional probability
tables using gradient ascent
Summary
Bayesian networks provide a natural
representation for (causally induced)
conditional independence
Topology + CPTs = compact
representation of joint distribution
Generally easy for domain experts to
construct
-> P(d|a,b,c)=P(d|a,c)=0.66
P(d | a,b,c) P(a)P(b)P(c | a,b)P(d | a,c)
P(D | a,b,c) 0.0825,0.0425 0.66,034

->

P(b | a,c,d) P(a)P(b)P(c | a,b)P(d | a,c)


c

P(b | a,c,d) P(a)P(b) P(c | a,b)P(d | a,c)


c

P(B | a,c,d) 0.05,0.075 0.4,0.6


P(b | a,c,d) 0.6
Bayesian networks
Conditional Independence
Inference in Bayesian Networks
Irrelevant variables
Constructing Bayesian Networks
Aprendizagem Redes Bayesianas

Examples - Exercisos
rv dec ID3

You might also like