0% found this document useful (0 votes)
22 views32 pages

Chapter 1 Uncertainty

Uploaded by

Karthik Nadar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views32 pages

Chapter 1 Uncertainty

Uploaded by

Karthik Nadar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

STTP on Artificial Intelligence Towards Data Science Applications

UNCERTAINTY
DR. NILESH M. PATIL,
ASSOCIATE PROFESSOR, COMPUTER ENGINEERING DEPT.,

SVKM’S D J SANGHVI COLLEGE OF ENGINEERING


STTP on Artificial Intelligence Towards Data Science Applications

2 INTRODUCTION

• Uncertainty arises when we are not 100 percent sure about the outcome of the
decisions.
• This mostly happens in those cases where the conditions are neither completely true nor
completely false.
STTP on Artificial Intelligence Towards Data Science Applications

3 REASONS FOR UNCERTAINTY

• Partially observable environment


• Dynamic environment
• Incomplete knowledge of the agent
• Inaccessible areas in the environment
STTP on Artificial Intelligence Towards Data Science Applications

4 TAXONOMY OF UNCERTAINTY
STTP on Artificial Intelligence Towards Data Science Applications

5 METHODS TO HANDLE UNCERTAINTY

• Fuzzy Logic
• Probabilistic Reasoning
• Hidden Markov Models
• Neural Networks
STTP on Artificial Intelligence Towards Data Science Applications

6 PROBABILISTIC REASONING

• Probability is the calculus of gambling.


• Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge.

• Probability handles uncertainty that is the result of someone's laziness and ignorance.
STTP on Artificial Intelligence Towards Data Science Applications

7 A CLASSIC EXAMPLE

Can Tweety fly???

• Birds typically fly ❑ Birds typically fly


 Penguins are birds
• Tweety is a bird.
 Penguins typically do not fly
• Tweety flies
 Tweety is a Penguin.
◼ Tweety does not fly.
STTP on Artificial Intelligence Towards Data Science Applications

8 NEED OF PROBABILISTIC REASONING IN AI

1. When there are unpredictable outcomes.


2. When specifications or possibilities of predicates becomes too large to handle.
3. When an unknown error occurs during an experiment.
STTP on Artificial Intelligence Towards Data Science Applications

9 PROBABILITY

• Probability can be defined as a chance that an uncertain event will occur.


• The value of probability always remains between 0 and 1 that represent ideal uncertainties.

• Each possible world ω is associated with a numerical probability P(ω) such that:

• Example: If we are about to roll two (distinguishable) dice, there are 36 possible worlds to
consider: (1,1), (1,2),…, (6,6)
• P(ω) =1/36
STTP on Artificial Intelligence Towards Data Science Applications

10 AXIOMS IN PROBABILITY

➢ 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.


➢ P(A) = 0, indicates total uncertainty in an event A.
➢ P(A) = 1, indicates total certainty in an event A.
➢ We can find the probability of an uncertain event by using the below formula.
Number of desired outcomes
Probability of Occurrence =
Total number of outcomes
➢ P(𝐴)ҧ = probability of event A not happening.
➢ P(𝐴)ҧ + P(A) = 1.
STTP on Artificial Intelligence Towards Data Science Applications

11 TERMINOLOGIES IN PROBABILITY

• Event
• Sample space
• Random variable
• Prior probability
• Posterior probability
P(A∩B)
• Conditional probability P A B = P(B)

where, P(A ∩ B) = Joint Probability of A and B, P(B) = Marginal Probability of B and P(B) > 0
INFERENCE USING FULL JOINT DISTRIBUTIONS
STTP on Artificial Intelligence Towards Data Science Applications

12

• Probabilistic inference: The computation of posterior probabilities for query propositions given observed evidence.

• The full joint probability distribution specifies the probability of each complete assignment of values to random variables.
• Marginalization: to get the marginal probability-- attained by adding the entries in the corresponding rows or columns
• For example, P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
• There are six atomic events for (cavity ∨ toothache): 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
• Variant of marginalization is called conditioning.
𝑃(𝑐𝑎𝑣𝑖𝑡𝑦 ∧𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.108+0.012 0.12
• Computing a conditional probability 𝑃 𝑐𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = = = = 0.6
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.108+0.012+0.016+0.064 0.2
0.016+0.064
• Similarly, 𝑃 ∼ 𝑐𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = = 0.4
0.2

1 1
• In both the cases, = = 5 remains constant, no matter which value of cavity we calculate.
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.2

• It is a normalization constant (α) ensuring that the distribution P(cavity | toothache) adds up to 1.
STTP on Artificial Intelligence Towards Data Science Applications

13 EXAMPLE 1

In a class, there are 80% of the students who like English and 30% of the students who likes English
and Mathematics, and then what is the percentage of students those who like English, also like
mathematics?
STTP on Artificial Intelligence Towards Data Science Applications

14 EXAMPLE 2
The table below shows the occurrence of diabetes in 100 people. Let D and N be the events where a randomly
selected person "has diabetes" and "not overweight". Then find P(D | N).

Diabetes (D) ഥ)
No Diabetes (𝐷

Not overweight (N) 5 45


Overweight (𝑁) 17 33
STTP on Artificial Intelligence Towards Data Science Applications

15 BAYES THEOREM

• Bayes' Theorem, named after 18th-century British mathematician Thomas Bayes.


• Bayes theorem plays a critical role in probabilistic learning and classification.
• Bayes' Theorem allows you to update the predicted probabilities of an event by
incorporating new information.
• Uses prior probability of each category given no information about an item.
• It is often employed in finance in calculating or updating risk evaluation.
• The theorem is also called Bayes' Rule or Bayes' Law.
STTP on Artificial Intelligence Towards Data Science Applications

16 BASIC PROBABILITY FORMULAS

• Product rule P( A  B) = P( A | B) P( B) = P( B | A) P( A)

• Sum rule P( A  B) = P( A) + P( B) − P( A  B)

• Bayes theorem P ( D | h) P ( h)
P(h | D) =
P( D)

• Theorem of total probability, if event Ai is mutually exclusive and probability sum to one
n
P ( B ) =  P ( B | Ai ) P( Ai )
i =1
STTP on Artificial Intelligence Towards Data Science Applications

17 BAYES THEOREM

• Given a hypothesis h and data D which bears on the hypothesis:


P ( D | h) P ( h)
P(h | D) =
P( D)
• P(h): independent probability of h: prior probability
• P(D): independent probability of D
• P(D|h): conditional probability of D given h: likelihood
• P(h|D): conditional probability of h given D: posterior probability
STTP on Artificial Intelligence Towards Data Science Applications

18 EXAMPLE 3

In Orange County, 51% of the adults are males. One adult is randomly selected for a survey involving credit card usage.
a. Find the prior probability that the selected person is a male.
b. It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke
cigars. Use this additional information to find the probability that the selected subject is a male.
STTP on Artificial Intelligence Towards Data Science Applications

19 EXAMPLE 4

A doctor is called to see a sick child. The doctor has prior information that 90% of sick children in that neighborhood have the flu,
while the other 10% are sick with measles. A well-known symptom of measles is a rash (the event of having which we denote R).
Assume that the probability of having a rash if one has measles is P(R | M) = 0.95. However, occasionally children with flu also
develop rash, and the probability of having a rash if one has flu is P(R | F) = 0.08. Upon examining the child, the doctor finds a rash.
What is the probability that the child has measles?
STTP on Artificial Intelligence Towards Data Science Applications

20 INDEPENDENCE

• Also called marginal independence / absolute independence.


• Reduce the amount of information necessary to specify the full joint distribution.
• Independence between variables X and Y can be written as:
P(X|Y) = P(X) or P(Y|X) = P(Y) or P(X, Y) = P(X)P(Y)
STTP on Artificial Intelligence Towards Data Science Applications 21

BAYESIAN NETWORK MOTIVATION

• We want a representation and reasoning system that is


based on conditional independence
• Compact yet expressive representation
• Efficient reasoning procedures
Thomas Bayes
• Bayesian Networks are such a representation
• Named after Thomas Bayes
• Term coined in 1985 by Judea Pearl
• Their invention changed the focus on AI from logic to
probability! Judea Pearl
STTP on Artificial Intelligence Towards Data Science Applications

22 BAYESIAN NETWORKS

• A Bayesian network specifies a joint distribution in a structured form

• Represent dependence/independence via a directed graph


• Nodes = random variables
• Edges = direct dependence

• Structure of the graph  Conditional independence relations

• Requires that graph is acyclic (no directed cycles)

• Two components to a Bayesian network


• The graph structure (conditional independence assumptions)
• The numerical probabilities (for each variable given its parents)
STTP on Artificial Intelligence Towards Data Science Applications

23 BAYESIAN NETWORKS

• General form:
𝑃(𝑋1, 𝑋2, … . 𝑋𝑁 ) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) )
𝑖

The full joint distribution The graph-structured approximation


STTP on Artificial Intelligence Towards Data Science Applications

24 EXAMPLE OF A SIMPLE BAYESIAN NETWORK

𝑃(𝑋1, 𝑋2, … . 𝑋𝑁) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) ) B


A
𝑖

𝑃 𝐴, 𝐵, 𝐶 = 𝑃 𝐶 𝐴, 𝐵 𝑃 𝐴 𝑃(𝐵)

C
• Probability model has simple factored form
• Directed edges => direct dependence
• Absence of an edge => conditional independence
• Also known as belief networks, graphical models, causal networks
STTP on Artificial Intelligence Towards Data Science Applications

25 EXAMPLE OF 3-WAY BAYESIAN NETWORKS

• Conditionally independent effects: A


𝑝(𝐴, 𝐵, 𝐶) = 𝑝(𝐵|𝐴)𝑝(𝐶|𝐴)𝑝(𝐴)

B C
• B and C are conditionally independent given A

• e.g., A is a disease, and we model B and C as


conditionally independent symptoms given A
STTP on Artificial Intelligence Towards Data Science Applications

26 EXAMPLE OF 3-WAY BAYESIAN NETWORKS

• Independent Clauses:
A B
𝑝(𝐴, 𝐵, 𝐶) = 𝑝(𝐶|𝐴, 𝐵)𝑝(𝐴)𝑝(𝐵)
C

• “Explaining away” effect:


• A and B are independent but become
dependent once C is known!!
STTP on Artificial Intelligence Towards Data Science Applications

27 ALARM EXAMPLE

• Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors John and
Marry, who have taken a responsibility to inform Harry at work when they hear the alarm. John
always calls Harry when he hears the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Mary likes to listen to high music, so sometimes she
misses to hear the alarm. Here we would like to compute the probability of Burglary Alarm.
• Problem: Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and John and Mary both called the Harry.
STTP on Artificial Intelligence Towards Data Science Applications

28 SOLUTION

• List of all events occurring in this network:


❖ Burglary (B)
❖ Earthquake(E)
❖ Alarm(A)
❖ John Calls(J)
❖ Marry calls(M)
• We can write the events of problem statement in the form of
probability:
ത 𝐸ത
𝑃 𝑀, 𝐽, 𝐴, 𝐵,
= 𝑃(𝑀|𝐴) × 𝑃(𝐽|𝐴) × 𝑃(𝐴|𝐵ത ∩ 𝐸) ത × 𝑃(𝐵)
ത × 𝑃(𝐸) ത
= 0.70 × 0.90 × 0.001 × 0.999 × 0.998
= 0.00068045
STTP on Artificial Intelligence Towards Data Science Applications

29 INFERENCE IN BAYESIAN BELIEF NETWORKS

• A Bayesian Network can be used to compute the probability distribution for any subset
of network variables given the values or distributions for any subset of the remaining
variables.
• Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian
Network is known to be NP-hard.
STTP on Artificial Intelligence Towards Data Science Applications

30 ADVANTAGES OF BAYESIAN BELIEF NETWORK

• Intuitive, graphical, and efficient


• Accounts for sources of uncertainty
• Allows for information updating
• Models multiple interdependencies
• Includes utility and decision nodes
STTP on Artificial Intelligence Towards Data Science Applications

31 DISADVANTAGES OF BAYESIAN BELIEF NETWORK

• Not ideally suited for computing small probabilities


• Computationally demanding for systems with a large number of random variables
• Exponential growth of computational effort with increased number of states
STTP on Artificial Intelligence Towards Data Science Applications

32

You might also like