AAI Module 3 Notes
AAI Module 3 Notes
Module 3
• Each node corresponds to the random variables, and a variable can be continuous or discrete.
• Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
• These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other.
• In the diagram, A, B, C, and D, are random variables represented by the nodes of the network graph.
• If we are considering node B, which is connected with node A by a directed arrow, then node A is
called the parent of Node B.
• Node C is independent of node A.
Example 1:
• You have installed a new burglar alarm at your home to detect burglary. The alarm reliably responds
at detecting a burglary but also responds for minor earthquakes. You have two neighbours John and
Mary, who have taken a responsibility to inform you when they hear the alarm. John always calls
when he hears the alarm, but sometimes he gets confused with the phone ringing and calls at that
time too. On the other hand, Mary likes to listen to high music, so sometimes she misses to hear the
alarm.
• Problem: Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and John and Mary, both called you.
• List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o John Calls(J)
o Mary calls(M)
• The probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred,
and John and Mary both called you is:
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 2
• Probability Query (Marginal Inference): This involves calculating the marginal probability of a variable
or a set of variables given evidence. The goal is to determine the likelihood of certain events occurring
based on observed data.
• MAP Query (Maximum A Posteriori): This involves finding the most probable assignment of values to
a set of variables given evidence. It aims to identify the most likely explanation for the observed data.
• This operation essentially eliminates the variable X by summing over all its possible values.
• Repeat the elimination process for each variable in the chosen order until only factors involving the
query variables remain.
• Multiply the remaining factors to get the joint probability distribution over the query variables.
• Normalize the resulting distribution to obtain the final probabilities.
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 3
Enumeration algorithm for answering queries on Bayesian networks
• The Enumeration algorithm is a simple but exact method for answering probability queries in
Bayesian networks.
• It is based on the principle of systematically enumerating all possible combinations of values for the
variables in the network, considering the evidence observed.
• The algorithm calculates the probability of interest by summing up the joint probabilities of the
relevant combinations.
• Rejection sampling is a general REJECTION method for producing samples from a hard-to-sample
distribution given an easy-to-sample distribution.
• First, it generates samples from the prior distribution specified by the network. Then, it rejects all
those that do not match the evidence.
• The biggest problem with rejection sampling is that it rejects so many samples. The fraction of
samples consistent with the evidence e drops exponentially as the number of evidence variables
grows, so the procedure is simply unusable for complex problems.
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 4
Likelihood-weighting algorithm for inference in Bayesian networks
• Likelihood weighting is a form of importance sampling where the variables are sampled in the order
defined by a belief network, and evidence is used to update the weights.
• The weights reflect the probability that a sample would not be rejected.
• MCMC methods, such as the Metropolis-Hastings algorithm and Gibbs sampling, involve iteratively
sampling values for the unobserved variables given the observed evidence.
• These methods aim to explore the space of possible configurations and converge to the true
distribution.
Importance Sampling:
• Importance sampling involves sampling from an easily sampled distribution (the proposal
distribution) and re-weighting the samples to approximate the target distribution.
• The challenge is to choose a proposal distribution that is easy to sample from and has significant
overlap with the target distribution.
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 5
Variational Inference:
• Variational inference formulates the inference problem as an optimization task. It introduces a family
of approximating distributions and seeks the distribution within that family that is closest to the true
posterior.
• The optimization involves minimizing the Kullback-Leibler divergence between the true distribution
and the approximating distribution.
Expectation-Maximization (EM):
• EM is an iterative optimization algorithm that alternates between an E-step, where the expected
value of the unobserved variables is calculated given the observed data and current parameters, and
an M-step, where the parameters are updated to maximize the expected log-likelihood.
• EM is often used for learning the parameters of the Bayesian network when some variables are
unobserved.
Important Questions
1. Write the variable elimination algorithm and rejection-sampling algorithm for inference in Bayesian
networks.
2. Write the likelihood-weighting algorithm for inference in Bayesian networks and explain the working
of the algorithm.
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 6
3. Explain the Gibbs sampling algorithm for approximate inference in Bayesian networks.
4. Explain the Enumeration algorithm for answering queries on Bayesian networks.
5. What is Exact Inference in Bayesian Networks.
6. Explain the use of Approximate Inference in Bayesian Networks.
7. We have a bag of three biased coins a, b, and c with probabilities of coming up heads of 20%, 60%,
and 80%, respectively. One coin is drawn randomly from the bag (with equal likelihood of drawing
each of the three coins), and then the coin is flipped three times to generate the outcomes X1, X2,
and X3.
Draw the Bayesian network corresponding to this setup and define the necessary CPTs.
Calculate which coin was most likely to have been drawn from the bag if the observed flips come out
heads twice and tails once.
8. A patient has a disease N. Physicians measure the value of a parameter P to see the disease
development. The parameter can take one of the following values {low, medium, high}. The value of
P is a result of patient’s unobservable condition/state S. S can be {good, poor}. The state changes
between two consecutive days in one fifth of cases. If the patient is in good condition, the value for
P is rather low (having 10 sample measurements, 5 of them are low, 3 medium and 2 high), while if
the patient is in poor condition, the value is rather high (having 10 measurements, 3 are low, 3
medium and 4 high). On arrival to the hospital on day 0, the patient’s condition was unknown, i.e., P
r (S0 = good) = 0.5.
Draw the transition and sensor model of the dynamic Bayesian network modelling the domain under
consideration.
Calculate probability that the patient is in good condition on day 2 given low P values on days 1 and
2.
can you determine the most likely patient state sequence in days 0, 1 and 2 without any additional
computations? justify.
Dr. Navaneeth Bhaskar, Associate Professor, CSE (Data Science), SCEM Mangalore 7