0% found this document useful (0 votes)
8 views22 pages

2bayesian Learning

Bayesian learning methods provide a flexible approach to hypothesis evaluation by calculating explicit probabilities and accommodating prior knowledge. They utilize Bayes theorem to determine the most probable hypothesis based on observed data, although they can be computationally intensive. The document discusses concepts such as prior and posterior probabilities, maximum a posteriori (MAP) hypotheses, and maximum likelihood (ML) hypotheses, along with practical applications like medical diagnosis.

Uploaded by

harshsinghinf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

2bayesian Learning

Bayesian learning methods provide a flexible approach to hypothesis evaluation by calculating explicit probabilities and accommodating prior knowledge. They utilize Bayes theorem to determine the most probable hypothesis based on observed data, although they can be computationally intensive. The document discusses concepts such as prior and posterior probabilities, maximum a posteriori (MAP) hypotheses, and maximum likelihood (ML) hypotheses, along with practical applications like medical diagnosis.

Uploaded by

harshsinghinf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

BAYESIAN LEARNING

INTRODUCTION
• Why Bayesian ?
• For Two different reasons
– Bayesian learning algorithms that calculate explicit
probabilities for hypotheses, such as the naive Bayes
classifier are most practical approaches than decision
tree and neural network algorithms
– Learning to classify text documents such as electronic
news articles
Bayesian Learning
Features of Bayesian learning methods:
• Each observed training example can incrementally decrease or
increase the estimated probability that a hypothesis is correct.
– This provides a more flexible approach to learning than
algorithms that completely eliminate a hypothesis if it is
found to be inconsistent with any single example.
• Prior knowledge can be combined with observed data to
determine the final probability of a hypothesis. In Bayesian
learning, prior knowledge is provided by asserting
– a prior probability for each candidate hypothesis, and
– a probability distribution over observed data for each possible
hypothesis.
Bayesian Learning
• Bayesian methods can accommodate hypotheses
that make probabilistic predictions
• New instances can be classified by combining the
predictions of multiple hypotheses, weighted by
their probabilities.
• Bayesian methods prove computationally
intractable, they can provide a standard of
optimal decision making against which other
practical methods can be measured.
Difficulties with Bayesian Methods
• Require initial knowledge of many probabilities
– When these probabilities are not known in advance they are
often estimated based on background knowledge, previously
available data, and assumptions about the form of the
underlying distributions.
• Significant computational cost is required to determine the
Bayes optimal hypothesis in the general case (linear in the
number of candidate hypotheses).
– In certain specialized situations, this computational cost can
be significantly reduced.
Bayes
Theorem
• In machine learning, we try to determine the best
hypothesis from some hypothesis space H, given the
observed training data D.
• In Bayesian learning, the best hypothesis means the
most probable hypothesis, given the data D plus any
initial knowledge about the prior probabilities of the
various hypotheses in H.
• Bayes theorem provides a way to calculate the
probability of a hypothesis based on its prior
probability, the probabilities of observing various data
given the hypothesis, and the observed data itself.
Bayes
Theorem
P(h) is prior probability of hypothesis h
– P(h) to denote the initial probability that hypothesis h holds, before observing training data.
– P(h) may reflect any background knowledge we have about the chance that h is correct. If
we have no such prior knowledge, then each candidate hypothesis might simply get the
same prior probability.
P(D) is prior probability of training data D
– The probability of D given no knowledge about which hypothesis holds
P(h|D) is posterior probability of h given D
– P(h|D) is called the posterior probability of h, because it reflects our confidence thath
holds after we have seen the training data D.
– The posterior probability P(h|D) reflects the influence of the training data D, in contrast to
the prior probability P(h), which is independent ofD.
P(D|h) is posterior probability of D given h
– The probability of observing data D given some world in which hypothesis h holds.
– Generally, we write P(xly) to denote the probability of event x given event y.
Bayes Theorem
• In ML problems, we are interested in the probability P(h|D) that h
holds given the observed training data D.
• Bayes theorem provides a way to calculate the posterior probability
P(h|D), from the prior probability P(h), together with P(D) and P(D|h).

P(D |h) P(h)


Bayes Theorem: P(h | D)=
P(D)

• P(h|D) increases with P(h) and P(D|h) according to Bayes theorem.


• P(h|D) decreases as P(D) increases, because the more probable it is
that D will be observed independent of h, the less evidence D provides
in support of h.
Prior and Posterior Probability

• Prior probability : The probability of an


event before new data is collected

• Posterior probability : The probability of an


event after new data is collected
Maximum A Posteriori (MAP)
Hypothesis, hMAP
➢ The learner considers some set of candidate hypotheses H and it is
interested in finding the most probable hypothesis h € H given the
observed data D
➢ Such maximally probable hypothesis is called a maximum A
posteriori (MAP) hypothesis hMAP.
➢ Determine the MAP hypotheses by using Bayes theorem to
calculate the posterior probability of each candidate hypothesis
Maximum Likelihood (ML)
Hypothesis, hML
• If we assume that every hypothesis in H is equally
probable
i.e. P(hi) = P(hj) for all hi and hj in H
Then consider only P(D|h) to find the most probable
hypothesis.
• P(D|h) is often called the likelihood of the data D given
h
• Any hypothesis that maximizes P(D|h) is called a
maximum likelihood (ML) hypothesis, hML.
Example
➢ A medical diagnosis problem in which there are two
alternative hypotheses:
➢ (1) that the patient has a particular form of cancer, and
➢ (2) that the patient does not.
➢ The available data is from a particular laboratory test
with two possible outcomes: + (positive) and - (negative)
➢ We have prior knowledge that over the entire population
of people only .008 have this disease. Furthermore, the
lab test is only an imperfect indicator of the disease.
➢ The test returns a correct positive result in only 98% of
the cases in which the disease is actually present and a
correct negative result in only 97% of the cases in which
the disease is not present
➢ In other cases, the test returns the opposite result
Example - Does patient have cancer or
not?

P(cancer) = .008 P(notcancer) = .992


P(+|cancer) = .98 P(-|cancer) = .02
P(+|notcancer) = .03 P(-|notcancer) = .97
• A patient takes a lab test and the result comes back positive.
P(+|cancer) P(cancer) = .98 * .008 = .0078
P(+|notcancer) P(notcancer) = .03 * .992 = .0298 ➔ hMAP is notcancer
• Since P(cancer|+) + P(notcancer|+) must be 1
P(cancer|+) = .0078 / (.0078+.0298) = .21
P(notcancer|+) = .0298 / (.0078+.0298) = .79
Solution

The above situation can be summarized by the. following


probabilities:
Maximum a posteriori hypothesis
Suppose we now observe a new patient for whom the lab test returns a positive
result. Should we diagnose the patient as having cancer or not? The maximum
a posteriori hypothesis can be found using
MINIMUM DESCRIPTION LENGTH
PRINCIPLE
• https://www.youtube.com/watch?v=tRHpFG3
P2k8
• https://www.youtube.com/watch?v=0kufNLe3
1t0
BAYES OPTIMAL CLASSIFIER
• https://www.youtube.com/watch?v=o5x361YstFI
• https://www.youtube.com/watch?v=7R3b59ohiv
U
• https://www.youtube.com/watch?v=kWV_dVKn
m2c
• https://www.youtube.com/watch?v=t51t8kPGvis
• https://www.youtube.com/watch?v=i4qF0-Jroq0
GIBBS ALGORITHM
• https://www.youtube.com/watch?v=602Bus3
1zgc
• https://www.youtube.com/watch?v=o5x361Ys
tFI
NAIVE BAYES CLASSIFIER
• https://www.youtube.com/watch?v=XzSlEA4c
k2I
• https://www.youtube.com/watch?v=AUPmlIY
_Rkw
• https://www.youtube.com/watch?v=caRLHyy
Uudg
• https://www.youtube.com/watch?v=CICk9ApE
C3U
Naïve bayes
• https://www.youtube.com/watch?v=Ab4viREnP74
• Gaussian Naive Bayes Classifier (for Continuous values)
• https://www.youtube.com/watch?v=kufuBE6TJew
• Solved Example Naive Bayes Classification Age Income
Student Credit Rating Buys Computer Mahesh
• https://www.youtube.com/watch?v=ztYAWF8tzLI
• classify the new example as Senior or Junior
https://www.youtube.com/watch?v=Tw4U4a8VmIs
More
https://www.youtube.com/watch?v=QPvHY9t1Ouw
• Text classifier
• https://www.youtube.com/watch?v=fgbG7fH
QwJk
• Spam mail classifier
• https://www.youtube.com/watch?v=YcsDbCv
RBxg
MAXIMUM LIKELIHOOD AND LEAST-SQUARED ERROR
HYPOTHESES

• https://www.youtube.com/watch?v=Yj5jkzPtu
cM
• https://www.youtube.com/watch?v=lx9PkgeO
5Hc

You might also like