Chapter 1 Uncertainty
Chapter 1 Uncertainty
UNCERTAINTY
DR. NILESH M. PATIL,
ASSOCIATE PROFESSOR, COMPUTER ENGINEERING DEPT.,
2 INTRODUCTION
• Uncertainty arises when we are not 100 percent sure about the outcome of the
decisions.
• This mostly happens in those cases where the conditions are neither completely true nor
completely false.
STTP on Artificial Intelligence Towards Data Science Applications
4 TAXONOMY OF UNCERTAINTY
STTP on Artificial Intelligence Towards Data Science Applications
• Fuzzy Logic
• Probabilistic Reasoning
• Hidden Markov Models
• Neural Networks
STTP on Artificial Intelligence Towards Data Science Applications
6 PROBABILISTIC REASONING
• Probability handles uncertainty that is the result of someone's laziness and ignorance.
STTP on Artificial Intelligence Towards Data Science Applications
7 A CLASSIC EXAMPLE
9 PROBABILITY
• Each possible world ω is associated with a numerical probability P(ω) such that:
• Example: If we are about to roll two (distinguishable) dice, there are 36 possible worlds to
consider: (1,1), (1,2),…, (6,6)
• P(ω) =1/36
STTP on Artificial Intelligence Towards Data Science Applications
10 AXIOMS IN PROBABILITY
11 TERMINOLOGIES IN PROBABILITY
• Event
• Sample space
• Random variable
• Prior probability
• Posterior probability
P(A∩B)
• Conditional probability P A B = P(B)
where, P(A ∩ B) = Joint Probability of A and B, P(B) = Marginal Probability of B and P(B) > 0
INFERENCE USING FULL JOINT DISTRIBUTIONS
STTP on Artificial Intelligence Towards Data Science Applications
12
• Probabilistic inference: The computation of posterior probabilities for query propositions given observed evidence.
• The full joint probability distribution specifies the probability of each complete assignment of values to random variables.
• Marginalization: to get the marginal probability-- attained by adding the entries in the corresponding rows or columns
• For example, P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
• There are six atomic events for (cavity ∨ toothache): 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
• Variant of marginalization is called conditioning.
𝑃(𝑐𝑎𝑣𝑖𝑡𝑦 ∧𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.108+0.012 0.12
• Computing a conditional probability 𝑃 𝑐𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = = = = 0.6
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.108+0.012+0.016+0.064 0.2
0.016+0.064
• Similarly, 𝑃 ∼ 𝑐𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = = 0.4
0.2
1 1
• In both the cases, = = 5 remains constant, no matter which value of cavity we calculate.
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.2
• It is a normalization constant (α) ensuring that the distribution P(cavity | toothache) adds up to 1.
STTP on Artificial Intelligence Towards Data Science Applications
13 EXAMPLE 1
In a class, there are 80% of the students who like English and 30% of the students who likes English
and Mathematics, and then what is the percentage of students those who like English, also like
mathematics?
STTP on Artificial Intelligence Towards Data Science Applications
14 EXAMPLE 2
The table below shows the occurrence of diabetes in 100 people. Let D and N be the events where a randomly
selected person "has diabetes" and "not overweight". Then find P(D | N).
Diabetes (D) ഥ)
No Diabetes (𝐷
ഥ
Overweight (𝑁) 17 33
STTP on Artificial Intelligence Towards Data Science Applications
15 BAYES THEOREM
• Product rule P( A B) = P( A | B) P( B) = P( B | A) P( A)
• Sum rule P( A B) = P( A) + P( B) − P( A B)
• Bayes theorem P ( D | h) P ( h)
P(h | D) =
P( D)
• Theorem of total probability, if event Ai is mutually exclusive and probability sum to one
n
P ( B ) = P ( B | Ai ) P( Ai )
i =1
STTP on Artificial Intelligence Towards Data Science Applications
17 BAYES THEOREM
18 EXAMPLE 3
In Orange County, 51% of the adults are males. One adult is randomly selected for a survey involving credit card usage.
a. Find the prior probability that the selected person is a male.
b. It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke
cigars. Use this additional information to find the probability that the selected subject is a male.
STTP on Artificial Intelligence Towards Data Science Applications
19 EXAMPLE 4
A doctor is called to see a sick child. The doctor has prior information that 90% of sick children in that neighborhood have the flu,
while the other 10% are sick with measles. A well-known symptom of measles is a rash (the event of having which we denote R).
Assume that the probability of having a rash if one has measles is P(R | M) = 0.95. However, occasionally children with flu also
develop rash, and the probability of having a rash if one has flu is P(R | F) = 0.08. Upon examining the child, the doctor finds a rash.
What is the probability that the child has measles?
STTP on Artificial Intelligence Towards Data Science Applications
20 INDEPENDENCE
22 BAYESIAN NETWORKS
23 BAYESIAN NETWORKS
• General form:
𝑃(𝑋1, 𝑋2, … . 𝑋𝑁 ) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) )
𝑖
𝑃 𝐴, 𝐵, 𝐶 = 𝑃 𝐶 𝐴, 𝐵 𝑃 𝐴 𝑃(𝐵)
C
• Probability model has simple factored form
• Directed edges => direct dependence
• Absence of an edge => conditional independence
• Also known as belief networks, graphical models, causal networks
STTP on Artificial Intelligence Towards Data Science Applications
B C
• B and C are conditionally independent given A
• Independent Clauses:
A B
𝑝(𝐴, 𝐵, 𝐶) = 𝑝(𝐶|𝐴, 𝐵)𝑝(𝐴)𝑝(𝐵)
C
27 ALARM EXAMPLE
• Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors John and
Marry, who have taken a responsibility to inform Harry at work when they hear the alarm. John
always calls Harry when he hears the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Mary likes to listen to high music, so sometimes she
misses to hear the alarm. Here we would like to compute the probability of Burglary Alarm.
• Problem: Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and John and Mary both called the Harry.
STTP on Artificial Intelligence Towards Data Science Applications
28 SOLUTION
• A Bayesian Network can be used to compute the probability distribution for any subset
of network variables given the values or distributions for any subset of the remaining
variables.
• Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian
Network is known to be NP-hard.
STTP on Artificial Intelligence Towards Data Science Applications
32