0% found this document useful (0 votes)
74 views71 pages

Probabilistic Reasoning

- The document discusses probabilistic reasoning and Bayesian networks. - It introduces probability, interpretations of probabilities, Bayesian networks, probabilistic inference, and learning in probabilistic models. - Bayesian networks allow modeling probabilistic relationships between variables and performing probabilistic inference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views71 pages

Probabilistic Reasoning

- The document discusses probabilistic reasoning and Bayesian networks. - It introduces probability, interpretations of probabilities, Bayesian networks, probabilistic inference, and learning in probabilistic models. - Bayesian networks allow modeling probabilistic relationships between variables and performing probabilistic inference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Probabilistic Reasoning -

Bayesian Networks
Prof. Dr. Paulo André L. de Castro
[email protected]
www.comp.ita.br/~pauloac Sala 110,
IEC-ITA
Summary
• Introduction and Review of Probability

• Interpretations of Probabilities

• Bayesian Networks or Belief Netorks

• Probabilistic Inference

• Learning in Probabilistic models

• Simplified Models: Näive Bayes and Noisy-OR


What does make the world Uncertainty
(Partially observed or stochastic) ?
1. Ignorance. The limits of our knowledge lead us to be uncertain
about many things. Does our poker opponent have a flush or
is she bluffing?

2. Phyical randomness or indeterminism. Even if we know


everything that we might care about a coin and how we impart
spin to it when we toss it., there will remain an inescapable
degree of uncertainty about it wil land heads or tails
• A strong deterministr person might claim otherwise, that it would be
possible to calculate......but such a view is for the foreseeable future a
mere act of scientistic faith. We are all practical indeterminists

3. Vagueness. Many of predicates we employ appear to be vague.


It is often unclear whether to classify a bird as big or small, a
human as brave or not, a thought as knowledge or opinion
Example 1: Breast Cancer
• Suponha que uma mulher tenha 1% de chance de ter cancer de mama.
Em uma clínica, há um teste de cancer com 20% de falso positivo e
10% de falso negativo, i.e. 10% das mulheres com cancer terão um
resultado negativo. Logo, 90% (das mulheres com câncer) terão um
resultado positivo.Uma paciente da clínica teve um resultado
positivo de cancer. Qual a probabilidade dela ter cancer
realmente?
• Como há apenas 20% de chance falso positivo, então seria
80%,certo?
Não! P (Cancer | Pos )  1  P ( Pos | Cancer )
P ( Pos | Cancer ) P (Cancer )
P (Cancer | Pos )  
P ( Pos )
P ( Pos | Cancer ) P (Cancer )

P ( Pos | Cancer ) P (Cancer )  P ( Pos | Cancer ) P (Cancer )
0.9 * 0.01
  0.043
0.9 * 0.01  0.2 * 0.99
Example 2: People vs Collins

• In 1964 (Los Angeles), an interracial couple was convicted of


robbery, largely on the grounds that they matched a highly
improbable profile, a profile witch witness reported. The two
robbbers were reported to be:
• A man with moustache
• Who has black and had a beard
• And a woman with a ponytail
• Who was blonde
• The couple was interracial
• And were driving a yellow car

• The prosecution suggested that these features had the


following probabilities of beign observed in LA at the time:
1. A man with moustache - 1/4
2. Who was black and had a beard - 1/10
3. And a woman with a ponytail - 1/10
4. Who was blonde - 1/3
5. The couple was interracial - 1/1000
6. And were driving a yellow car - 1/10
Example 2: People vs Collins – cont.
• The prosecution called an instructor of math from a State
university who apparently testified that the “product rule”
could be applied. So, the probability of the evidence(e) be
collected for an non guilty couple (h) would be:

P (e | h)   P (ei | h)  1 / 12.000 .000


i

• The prosecution stated that given the evidence the


probabiliyt of the couple were innocent was no more than
1/12.000.000.
• The jury convicted them

• Is the probability estimate correct?


Example 2: People vs Collins – cont.
• Is the probability estimate (1/12.000.000) correct?
• No!!!

• The product rule does not apply in this case!!


• The observations are not independent!!!

• P(h|e) is not equal to 1-P(e| not h)!

• Alright, What is the probability of the couple being guilty then


given all this data?
Example 2: People vs Collins – cont.
• The pieces of evidence are NOT independent!!!
1. A man with moustache - 1/4
2. Who was black and had a beard - 1/10
3. And a woman with a ponytail - 1/10
4. Who was blonde - 1/3
5. The couple was interracial - 1/1000
6. And were driving a yellow car - 1/10
• Given 2 implies 1, and together 2, 3 and 4 imply 5 (to a fair approximation). Then a
better estimate is

P ( e | h )  P ( e2 |h ) P ( e3 |h ) P ( e4 |h ) P ( e6 |h )  1 / 3000
• Furthermore, P(h|e) is not equal to 1-P(e| not h), but to : (using Bayes investion and sum-out)

P (e | h ) P ( h ) P (e | h ) P ( h )
P (h | e)  
P (e) P (e | h ) P ( h )  P (e | h ) P ( h )
Example 2: People vs Collins – cont.

• We do not have P(e|h) and P(h)...


• If the coulple is guitly, what are the chances the evidences
would be observed, i.e., How do estimate P(e|h) ?
• That is a hard question, but feeling generous for the prossecutions.
Let's say it's 100%
• Now, we are missing the prior probability of a random couple
being guilty of the robbery, or P(h) . The most plausible
approach to estimate it is to count the number of couples in LA
are give them an equal prior probability
• Let’s say there are 1,625,000 eligible males and as many
female in Los Angeles area...so:
P (e | h ) P ( h ) P (h)
P (h | e)   
P (e | h ) P ( h )  P (e | h ) P ( h ) P ( h )  P (e | h ) P ( h )
1 / 1625000
  0,002
1 / 1625000  (1  1 / 1625000 ) / 3000
Brief Review about
Statistics and
Probabilities
For more references about Prob.&Stat. see:
Devore, J. L. Probability and Statistics for Engineering and the Sciences. 6. ed.
Southbank:
Thomson, 2004.
Ross, M. S. Introduction to Probability and Statistics for Engineers and
Scientists. 2. ed. Harcourt: Academic Press, 1999
Statistics for Business and Economics. McClave, Benson, Sincich. 1998
Statistics and Probability [Short] Review

• A random variable has a domain (set of values) and associates


each one with an ocular value probability. This function is called
a probability distribution
• In the continuous case, the term probability density function is used.
• There are many classical distributions: Normal (Gaussian), Uniform,
Binomial, Poisson, Exponential, etc.

• P(A) – probabilidade a priori


• Example:
• Variable weather= {Sunny, Clouy, rainny}
• P(Weather) – is the probability distribution
• P(Weather) = <0,7;0,2;0,1>
• P(Weather=sunny) = 0.7
• P(Weather=rainny) = 0.1
• or
• P(sunny) = 0.7, P(rainny)=0.1
Two examples of Continuous
distributions
• Normal Distribution
• Uniform Distribution
2
1  x 
1   
f ( x)  e 2  

 2

• (a+b)/2 is the mean

• μ is the mean •
ba
is the standard deviation
12
• σ is the standard deviation
Example of a Discrete Distribution

• Dirichlet Discrete Distribution: a categorical distribution


(with finite possible states), it can be written as:

D[1 ,  2 ,..,  i ,...,  ]

• The probability of observing state i is:

i
P( X  i )  


j 1
j
Probability Axioms
• For any propositons A and B
Condicional probability
(Probabilidade condicional)

• P(A | K) – condicional probability or posterior probability (probabilidade


condicional ou probabilidade a posterior)
• For example, P(A=carie| K=toothache )=0,8 means that :
• given that toothache is all I know, the chance of caries (seen by me) is 80%.
• P(A |K) is a vector of 2 elements each one with two elemnts. Given
A=<carie, not carie>, K=<toothache, not toothache>
• For instance, P(A | K) = <<0,8;0,2>;<0,01;0,99>>
• If we know more, e.g., I know that I have carie than
• P(carie|toothache, carie) = 1
• Obs:
1) One belief may stay valid, but it may become useless
2) The new evidence may be useless:
P(carie|toothache, “corinthias lost the game”) = P(carie|toothache) = 0.8
Note the relevance of knowledge about the domain to any inference process!!
Conditional Prob. Basic axioms

P ( A, B )
P( A | B) 
P(B)
• Or we can write as:

P ( A, B )  P ( A | B ) P ( B )
• And we know that (sum-out):

P ( A)   P ( A, Bi )
i
• Then

P ( A)   P ( A | Bi ) P ( Bi )
i
Chain Rule (Regra da Cadeia)
n
P ( X 1 , X 2 , X 3 ,.. X n )   P ( X i | X 1 , X 2 , X 3 ,.. X )
i 1

• Demonstration:
P ( X 1 , X 2 , X 3 ,.. X n )  P ( X n | X 1 , X 2 , X 3 ,.. X n 1 ) P ( X 1 , X 2 , X 3 ,.. X n 1 ) 
P ( X n | X 1 , X 2 , X 3 ,.. X n 1 ) P ( X n 1 | X 1 , X 2 , X 3 ,.. X n  2 ) P ( X 1 , X 2 , X 3 ,.. X n  2 )
...... 
P ( X n | X 1 , X 2 , X 3 ,.. X n 1 ) P ( X n 1 | X 1 , X 2 , X 3 ,.. X n  2 ).. P ( X 1 ) 
n

 P( X
i 1
i | X 1 , X 2 , X 3 ,.. X )
Bayes Rule (Regra de Bayes)

P (e | H ) P ( H )
P ( H | e) 
P (e)
P(H): Hypothesis a priori probability

P(e): evidence a priori probability

P(H|e): Hypothesis posterior Probability

P(e|H): Probability of observing evidence e given H

Why is it relevant?
Cause and Effect
• We usually observe an effect and try to identify its cause

• So, We wanna know P(Cause| Effect) (i.e. probability of each


possible cause)

• However, it is usually easier to determine P(Effect| Cause)


than P( Cause | Effect), and:

P ( Effect | Cause ) P (Cause )


P (Cause | Effect ) 
P ( Effect )
Casino Example

• In one casino, the croupier speaks 12!


• Does he played dice or is he in a roullete??

• The questions are; P(roullete|12) =? and P(dice|12)=?

• We know that:
P (12 | dice ) P ( dice )
P ( dice | 12 ) 
P (12 )

• P(12|dice), P(12|roullete): easier to model...how?

• P(dice), P(roullete): How to estimate?


Another example: Meningitis
• Let's assume 0.8 of people with Meningitis present stiff neck
(S), probability of Meningitis is 1 in 10000 and Stiff neck
prob. is 0.1
Calculating the probability of the
evidence
• Suppose we wish to computer the probability of the observed
evidence, let's say P(E=e) and A has possible values a1, ...am . We
can apply Bayes' rule for each value of A:
P ( A  a1 | E  e )  P ( E  e | A  a1 ) P ( A  a1 ) / P ( E  e )
....
P ( A  am | E  e )  P ( E  e | A  am ) P ( A  am ) / P ( E  e )

• Adding these up and noting that,  P( A  a


i
i | E  e)  1

 P( A  a
i
i | E  e )  1   P ( E  e | A  ai ) P ( A  ai ) / P ( E  e )
i

• Then:
P ( E  e )   P ( E  e | A  ai ) P ( A  ai )
i
Calculating the probability of the
evidence - 2
• Since P ( E  e )   P ( E  e | A  ai ) P ( A  ai )
i

• The division by P(E=e) can be seen as a normalization factor


α, in equation below for any ak

P ( A  a k | E  e )  P ( E  e | A  a k ) P ( A  a k ) /  P ( E  e | A  ai ) P ( A  ai ) 
i

P ( E  e | A  ak ) P ( A  ak )

• In vectorial notation, we can write:

P ( A | E  e )   P ( E  e | A) P ( A)
Inference from Full joint distributions

• Tyipcally w, we are interested in the posterior joint distribution of the query variable Y
• given specifc values e for the evidence variables E

• Let the hidden variables be H= X - Y - E

• then the required summation of joint entries is done by summing out the hidden variables:

• the terms in the summation are joint entries because Y, E and H together exhaust the set of random
variables
• Obvious problems
1. Worst-case time complexity O(dn) where d is the number of possible elments of variable
2. Space complexity O(dn) to store the joint distirbution
3. How to find the numbers (probabilities) for O(dn) entrtries?
• n – number of variables
Inference from Full joint distributions - 2

• Inference from Full joint distributions could estimate any


conditional probability even when involving hidden variables
• But, it would require a large amount of space to store it and
even more data to build such full joint distribution

• Bayesian Network make it easier to build and store


distributions
Summary
• Introduction and Review of Probability

• Interpretations of Probabilities

• Bayesian Networks or Belief Netorks

• Probabilistic Inference

• Learning in Probabilistic models

• Simplified Models: Näive Bayes and Noisy-OR


Interpretations of Probabilities
• There are two main views about how to understand probilities: One
asserts that probabilites are fundamentally properties of non-
deterministic physical systems. This view is particulary associated with
frequentism.

• Popper's observation (195) thar frequency interpreation, precisse


though it wass fail to accommodate our intuition that probabilities of
singular events exist and are meangiful
• Do we need to toss a coin infinity (or many times) to make statements about the
probability of it landing head in one specific toss?

• The alternative view of probability is to think of probabilities as


reporting our subjective degrees of belief. This view was expressed by
Thomas Bayes (1958) and Pierre Simon de Laplace (1951)
Principal Principle and
Conditionalization
• Principal Principle: whenever you learn that the physical
probability of an outcome is r, set your subjective probability
for that outcome to r
• This is really just common sense, you may think that probability of a
friend shaving his head is 0.01, but if you learn that he will do so if
and only if a fair coin yet to be flipped lands head, you will revise
your subjective probability to 0.5

• Definition Conditionalization: After applying Bayes' theorem


to obtain P(h|e) adopt that as your degree of belief in h or
Bel(h) = P(h|e)
Belief Network (Rede Bayesiana ou Rede
de Crença)
• A simple, graphical notation for conditional independence
assertions and hece for compct specification of full joint
distributions

• Syntax:
• a set of nodes, one node per variable
• a directed, acyclic graph (link means “directly influences”)
• a conditional probabilty distribution (CPD) for each node given its
parents:
• P(Xi | Parents(Xi) )
• In the simplest case, conditional distribution are represented as a
conditional probability table (CPT) giving the distribution over Xi for
each combination of parent values
Example: Is it an Earthquake or burglar?
Example - 2
Markov Blanket (Cobertor de Markov)
A very simple Method to build Bayes
Networks
Exemplo
Another Example: Car Diagnosis
Another Example: Car Insurance
• Problem: Estimate expected costs (Medical, Liability,
Property) given some information (gray nodes)
I-map and D-map and Perfect Map
• I-map: All direct dependencies in the system being modeled
are explicitly shown via arcs. (Independence Map or I-map for
short).

• D-map: If every arc in a BN happens to correspond to a direct


dependence in the system, then the BN is said to be a
Dependence-map (or, D-map for short).

• A BN which is both an I-map and a D-map is said to be a


perfect map.
Sumário
• Redes Bayesianas ou Redes de crença

• Inferência probabilística

• Aprendizado em método probabilísticos

• Métodos simplificados: Bayes ingênuo e Noisy-OR


Inferência em Redes Bayesianas
• Dada uma rede, devemos ser capaz de inferir a partir dela
isto é :

• Busca responder questões simples, P(X| E=e)


• Ex.:
• Ou questões conjuntivas: P( Xi , Xj | E=e)
• Usando o fato:

• A inferência pode ser feita a partir da distribuição conjunta


total ou por enumeração
Inferência com Distribuição Conjunta
Total: Exemplo
Por exemplo para saber
P(A|b) temos
P(A|b)= P(A,b)/P(b)=

<P(a, b)/P(b);P(⌐a , b)/P(b) > =

=α< P(a, b);P(⌐a , b)>


= α [ <P(a,b,c)+P(a,b,⌐c); P(⌐a,b,c)+P(⌐a,b, ⌐c)>]

Observe que α pode ser visto como um fator de normalização para o vetor resultante
da distribuição de probabilidade, pedida P(A|b). Assim pode-se evitar seu cálculo,
Simplesmente normalizando <P(a,b); P(⌐a , b) >
Inferência em Redes Bayesianas
Inferência por Enumeração
• Enumeração é ineficiente (ex. calcula P(j|a)P(m|a) repetidamente), mas pode ser melhorada através
do armazenamento dos valores já calculados (Programação Dinâmica)
Calculando P(b|j,m) não normalizado

"P(b| j,m) nao normalizado"

0,0005922

0,001

+ 0,5922426

0,001197 0,591046

* 0,002 0,998

+ 0,598525 0,59223

X1*X2*X3 0,5985 0,000025 0,5922 0,00003

X1 0,95 0,05 0,94 0,06


X2 0,9 0,01 0,9 0,01

X3 0,7 0,05 0,7 0,05


Calculando P(não b|j,m) não normalizado
"P(nao b| j, m) nao normalizado"

0,001492

0,999

+ 0,001493

0,000366 0,001127

* 0,002 * 0,998

+ 0,183055 + 0,00113

Produtorio 0,1827 0,000355 0,00063 0,0005

0,29 0,71 0,001 0,999

0,9 0,01 0,9 0,01

0,7 0,05 0,7 0,05


Valores Normalizados P(b|j,m) e P(não b|j,m)
0,0005922
P (b | j , m)   0,2841
0,0005922  0,001492

0,001492
P (b | j , m )   0,7159
0,0005922  0,001492
Algoritmo de Enumeração
Inferência por Enumeração
• Algoritmo de Enumeração permite determinar uma
distribuição de probabilidade condicional
• P(variável de saída| evidências conhecidas)

• Também é possível responder perguntas conjuntivas usando


o fato:

• Demonstração?….
Demonstração

como:
Inferência por Enumeração
• Como observado, a enumeração tende a recalcular várias
vezes alguns valores

• Pode-se eliminar parte do retrabalho através da técnica de


programação dinâmica. Há vários algoritmos aplicáveis um
dos mais usados é o algoritmo de eliminação de variável
(variable enumeration)
• Basicamente, os valores já calculados são armazenados em uma
tabela e selecionados quando novamente necessários…Estas técnicas
são chamadas de Inferência Exata e podem se caras
computacioalmente para redes complexas
• Uma alterantiva são algortimos de inferência aproximada
(Approximate Inference), que se baseiam na amostragem da
rede para realizar inferência
• randomized sampling algorithms, also called Monte Carlo algorithms

• Mais informações Russel, cap. 14


Sumário
• Redes Bayesianas ou Redes de crença

• Inferência probabilística

• Aprendizado em método probabilísticos

• Métodos simplificados: Bayes ingênuo e Noisy-OR


Aprendizado em modelos
probabilísticos
• Aprender em redes bayesianas é o processo de determinar a
topologia da rede (isto é, seu grafo direcionado) e as tabelas
de probabilidade condicional

• Problemas?
• Como determinar a topologia?
• Como estimar as probabilidades ?
• Quão complexas são essas tarefas?
• Isto é quantas topologias e quantas probabilidades precisariam ser
determinadas….
Tamanho das Tabelas de Probabilidade Condicional e
Distribuição Conjunta Total
• Vamos supor que cada variável é influenciada por no máximo k outras variáveis
(Naturalmente, k<n=total de variáveis).

• Supondo variáveis booleanas, cada tabela de probabilidade condicional (CPT) terá no


máximo 2k entradas (ou probabilidades). Logo ao total haverá no máximo n* 2k
entradas

• Enquanto, na distribuição conjunta Total haverá 2n entradas. Por exemplo, para n=30
com no máximo cinco pais (k=5) isto significa 960 ao invés de mais um bilhão (230)
Número de “entradas” da Distribuição
Conjunta e na Rede Bayesiana - 2
• Em domínios onde cada variável pode ser diretemante influenciada por
todas as outras, tem-se a rede totalmente conectada e assim exige-se a
quantidade de entradas da mesma ordem da distribuição conjunta total

• Porém se essa dependência for tênue, pode não valer a pena a


complexidade adicional na rede em relação ao pequeno ganho em
exatidão

• Via de regra, se nos fixarmos em um modelo causal acabaremos tendo


de especificar uma quantidade menor de números, e os números
frequentemente serão mais fáceis de calcular. (Russel,Norvig, 2013, pg.
453)

• Modelos causais são aqueles onde se especifica no sentido causa efeito,


isto é P(efeito|causa) ao invés de P(causa|efeito), oque geralmente é
necessário para diagnóstico
Simplificando a representação tabelas
de probabilidade condicional (CPT)
• Vimos que que o número de entradas de uma CPT cresce
exponencialmente
• Para o caso binário e K pais, a CPT de um nó terá 2k probabilidades a
serem calculadas

• Vejamos duas abordagens para simplificar a rede através da


adoção de hipóteses simplificadoras
• Bayes Ingênuo e
• OU-ruidoso
Naïve Bayes (Bayes Ingênuo)
• Uma classe particular e simples de redes bayesianas é
chamada de Bayes Ingênuo (Naïve Bayes)
• Ela é simples por supor independência condicional
entre todas as variáveis X dada a variável Class
• As vezes, chamado também de classificador Bayes,
por ser frequentemente usado como abordagem
inicial para classificação
Naïve Bayes (Bayes Ingênuo) - 2
• A topologia simples traz a vantagem da representação
concisa da Distribuição Conjunta Total.
• Como todo os nós tem no máximo um pai, cada CPT de no X
tem apenas duas entradas e uma entrada no nó classe. Logo,
(2n-1) entradas para toda a rede. Naïve Bayes é linear em
relação ao número de nós (n) !!!!
• “Na prática, sistemas de Bayes ingênuos podem funcionar
surpreendentemente bem….”. pg. 438
Exemplo: Devo jogar tênis?
Ex Céu Temperatura Umidade Vento JogarTênis
X1 Ensolarado Quente Alta Fraco NÃO
X2 Ensolarado Quente Alta Forte NÃO
X3 Nublado Quente Alta Fraco SIM
X4 Chuvoso Boa Alta Fraco SIM
X5 Chuvoso Fria Normal Fraco SIM
X6 Chuvoso Fria Normal Forte NÃO
X7 Nublado Fria Normal Forte SIM
X8 Ensolarado Boa Alta Fraco NÃO
X9 Ensolarado Fria Normal Fraco SIM
X10 Chuvoso Boa Normal Fraco SIM
X11 Ensolarado Boa Normal Forte SIM
X12 Nublado Boa Alta Forte SIM
X13 Nublado Quente Normal Fraco SIM
X14 Chuvoso Boa Alta Forte NÃO
Usando a abordagem Bayes ingênuo

• Problema a resolver:
Solução:
• P(Play|Outlook,Temp,Hum,Wind)=
• P(Outlook,Temp,Hum,Wind|Play)P(Play)/P(Outlook,Temp,Hum,
Wind)=
• Regra da cadeia e indepêndencia:
• P(Outlook|Play)P(Temp|Play)P(Hum|Play)P(Wind|Play)P(Play)/
P(Outlook,Temp,Hum,Wind)

• O método de inferência por enumeração já visto é aplicável!!!


• Estima-se as probabilidades pelo conjunto de treinamento
Contagens e probabilides estimadas
pelo conjunto de treinamento

• P(Play=s|Outlook=sunny,Temp=cool,Hum=high,Wind=tru
e)=

• P(sunny|play)P(cool|play)P(high|play)P(true|play)P(Play)
/P(evidencia) = 2/9*3/9*3/9*3/9*9/14 / P(e)
=0.0053/P( e)
Solução 3 - continuação
• Da mesma forma,
• P(sunny|play)P(cool|play)P(high|play)P(true|play)P(Play)/P(e) =
3/5*1/5*4/5*3/5*5/14/P(e) =0.0206/P( e)
• Mas P(H |e) e P(not H|e) tem que somar 1, assim:
Estimativas de Probabilides
• Qual a estimativa da probabilidade
P(Outlook=overcast|Play=no)?

• Zero! Isto é razoável? Como resolver?


• Uma Solução: estimador de Laplace (Laplace smoothing). Seja V
o número de valores possíveis para A, estima-se P(A|B) :
• P(A=a|B=b) = [N(A=a,B=b)+1]/[N(B=b)+V]
Criando Distribuições Condicionais
Conjuntas Compactadas….
• Alguns problemas podem ser modelados com uma abordagem
do tipo Noisy-OR (ou ruidoso). A técnica parte de duas
hipóteses:
• Todas as causas de uma variável ser acionada estão listadas (pode-
se adicionar uma causa geral “outros”)
• Isto é, P (Fever | F,F,F) = 0
• Há independência condicionais entre oque causa a “falha” da variável
pai acionar a variável filho (efeito). Exemplo: o que impede a gripe de
causar febre em alguém é independente do que impede o resfriado
de causar febre.
• Isto é, P (not Fever| Cold,Flu,Malaria) = P( not Fever|Cold)P(not Fever| Flu)P(not
Fever | Malaria)
• Exemplo:
• P(Not fever |malaria) =0.1
• P(Not fever| flu) =0.2
• P(Not fever| cold)=0.6
Noisy -OR

• P(X | u1,…uj, ⌐uj+1, …. ⌐uk ) = <1- ∏ji=1 qi; ∏ji=1 qi >


• qi is the probability of cause i fails !!
Noisy -OR

• P(X | u1,…uj, ⌐uj+1, …. ⌐uk ) = <1- ∏ji=1 qi; ∏ji=1 qi >


• qi is the probability of cause i fails !!

You might also like