0% found this document useful (0 votes)

18 views54 pages

UNIT 4 - Bayesian Learning

Uploaded by

esmritypoudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views54 pages

UNIT 4 - Bayesian Learning

Uploaded by

esmritypoudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Machine Learning

Unit 4
Bayesian Learning

By
Dr. G. Sunitha
Professor & BoS Chairperson
Department of CSE

Department of Computer Science and Engineering

Sree Sainath Nagar, A. Rangampet, Tirupati – 517 102

1
Introduction
❖ Bayesian Learning provides a probabilistic approach to inference.
❖ It is based on the assumption that the quantities of interest are governed by probability distributions and that
optimal decisions can be made by reasoning about these probabilities together with observed data.
❖ Bayesian learning algorithms calculate explicit probabilities for hypotheses.

2
Features of Bayesian learning methods
❖ Each observed training example can incrementally decrease or increase the estimated probability that a
hypothesis is correct. This provides a more flexible approach to learning than algorithms that completely
eliminate a hypothesis if it is found to be inconsistent with any single example.
❖ Prior knowledge can be combined with observed data to determine the final probability of a hypothesis. In
Bayesian learning, prior knowledge is provided by asserting
(1) a prior probability for each candidate hypothesis, and
(2) a probability distribution over observed data for each possible hypothesis.
❖ Bayesian methods can accommodate hypotheses that make probabilistic predictions (e.g., hypotheses such as
"this pneumonia patient has a 93% chance of complete recovery").
❖ New instances can be classified by combining the predictions of multiple hypotheses, weighted by their
probabilities.
❖ They require initial knowledge of many probabilities. When these probabilities are not known in advance they
are often estimated based on background knowledge, previously available data, and assumptions about the
form of the underlying distributions.
❖ They require significant computational cost.
❖ They can provide a standard of optimal decision making against which other
practical methods can be measured.
3
Bayes Theorem
Terminology
❖ H is Hypothesis Space; h is hypothesis.
❖ D is training dataset ; d is a training sample.
❖ P(h) denotes the prior probability that hypothesis h holds, before training data is observed.
It reflects any background knowledge we have about the chance that h is a correct hypothesis. If we have
no such prior knowledge, then we might simply assign the same prior probability to each candidate
hypothesis.
❖ P(D) denotes the prior probability that training data D will be observed (i.e., the probability of D given no
knowledge about which hypothesis holds).
❖ P(D | h) denotes the probability of observing data D given hypothesis h holds.
❖ P (h | D) denotes posterior probability of h that h holds given the observed training data D.
❖ Bayes Theorem

4
Maximum a Posteriori (MAP) Hypothesis
❖ For a given problem there can be multiple hypothesis statements that are true.
❖ The most (maximally) probable hypothesis for the given data D is called as the MAP.

❖ Assuming that P(h) is same for all hypothesis, P(D l h) is often called the likelihood of the data D given h, and
any hypothesis that maximizes P(D l h) is called a maximum likelihood (ML) hypothesis.

5
Bayes Rule - Medical Diagnosis Problem
❖ Two alternative hypotheses:
(1) that the patient has a particular form of cancer. Hypothesis h = cancer
(2) that the patient does not have cancer. Hypothesis -h = No cancer

❖ Dataset contains patients data belonging to two classes : positive class + negative class -
❖ Prior Knowledge: over the entire population of people only 0.8% have this disease.
P(h) = 0.008 P ( - h) = 0.992
❖ The test returns a correct positive result in only 98% of the cases in which the disease is actually present.
P( + | h) = 0.98 P( - | h) = 0.02

❖ A correct negative result in only 97% of the cases in which the disease is not present.
P( + | -h) = 0.03 P( - | -h) = 0.97

6
Bayes Rule - Medical Diagnosis Problem . . .
❖ Suppose we now observe a new patient for whom the lab test returns a positive result. Should we diagnose the
patient as having cancer or not? The maximum a posteriori hypothesis can be found as follows:

P( + | h) = 0.98 P(h) = 0.008 P( h | + ) = 0.98 x 0.008 = 0.00784

P( + | -h) = 0.03 P(-h) = 0.992 P ( -h | - ) = 0.03 x 0.992 = 0.02976

hMAP = P( h | - )

There is no cancer for the patient. The lab test might be false positive.

7
Bayes Theorem and Concept Learning
❖ Since Bayes theorem provides a principled way to calculate the posterior probability of each hypothesis given
the training data, we can use it as the basis for a straightforward learning algorithm that calculates the
probability for each possible hypothesis, then outputs the most probable.

8
Brute-Force Bayes Concept Learning
❖ D is Instance Space (training data samples). Each sample is represented as
<Xi , ti> where Xi is a vector of independent variables, ti is the target variable.
❖ Let Hypothesis Space H be defined over Instance Space D.
❖ The task is to learn some target concept c : D → {0,1} i.e., to learn ti = c (Xi).
❖ Brute-Force Map Learning Algorithm

9
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .
• This algorithm may require significant computation, because it applies Bayes theorem to each
hypothesis in H to calculate P( h | D ). While this may prove impractical for large hypothesis
spaces, the algorithm is still of interest because it provides a standard against which we may judge
the performance of other concept learning algorithms.
• In order to specify a learning problem for the Brute-Force Map Learning algorithm, values for P(h) and
P(D | h) must be specified. We may choose the probability distributions P(h) and P(D | h) in any way in
order to describe our prior knowledge about the learning task. Here let us choose them to be consistent
with the following assumptions:
1. The training data D is noise free (i.e., ti = c(Xi)).
2. The target concept c is contained in the hypothesis space H
3. We have no prior reason to believe that any hypothesis is more probable than any other.

10
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .
• P(h) denotes the prior probability that hypothesis h holds, before training data is observed.
• How to choose value for P(h) –
o Given no prior knowledge that one hypothesis is more likely than another, it is reasonable to assign
the same prior probability to every hypothesis h in H.
o Furthermore, because we assume the target concept is contained in H we should require that these
prior probabilities sum to 1.
o Together these constraints imply that

11
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .
• P(D | h) denotes the probability of observing data D given hypothesis h holds.
• How to choose value for P(D | h) –
o Since noise-free training data is assumed, the probability of observing classification Xi given h is

𝑷 𝑫 𝒉) = 𝟏 𝒊𝒇 𝒕𝒊 = 𝒉 𝑿𝒊 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒔𝒂𝒎𝒑𝒍𝒆𝒔 𝑫𝒊 𝒊𝒏 𝑫

𝑷 𝑫 𝒉) = 𝟎 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆

In other words, the probability of data D given hypothesis h is

1 if D is consistent with h,
0 otherwise.

12
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .
Bayes Theorem to compute the posterior probability P(h | D) of each hypothesis h given the observed training
data D is as follows.

Case 1) Consider that h is inconsistent with the training data D. Then P( D| h ) = 0

13
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .
Case 2) Consider that h is consistent with the training data D. Then P( D| h ) = 1.
Let VSH,D is the subset of hypotheses from H that are consistent with D.
Then,

14
Brute-Force Bayes Concept Learning . . . .
❖ Brute-Force Map Learning Algorithm . . . .

15
MAP Hypotheses and Consistent Learners

16
MAP Hypotheses and Consistent Learners . . .
❖ For a given problem there can be multiple hypothesis statements that are true.
❖ The most (maximally) probable hypothesis for the given data D is called as the MAP (Maximum a Posteriori)
Hypothesis.

❖ A learning algorithm is a consistent learner provided it outputs a hypothesis that commits zero errors over the
training examples.
❖ Given the above analysis, it can be concluded that every consistent learner outputs a MAP hypothesis, if
• A uniform prior probability distribution over H is assumed and
• A deterministic, noise free training data is assumed.

17
Normal (Guassian) Distribution of Data
A Normal Distribution is a bell-shaped
distribution defined by the probability
density function

❖ About 68% of the values fall in the range between (x¯− σ) and (x¯+ σ) .
❖ About 95% of the values lie within two standard deviations of the mean, that is,
between (x¯−2σ)(x¯−2σ) and (x¯+2σ)(x¯+2σ) .
❖ About 99.7% of the values lie within three standard deviations of the mean, that is,
between (x¯−3σ)(x¯−3σ) and (x¯+3σ)(x¯+3σ) .

18
Maximum Likelihood and Least-squared Error Hypotheses
❖ Under certain assumptions any learning algorithm that minimizes the squared error between the output
hypothesis predictions and the training data will output a maximum likelihood hypothesis.
❖ Consider a set of training examples, where the target value of each example is corrupted by random noise
drawn according to a Normal probability distribution. More precisely, each training example is a pair of the form
(xi, ti) where ti = f(xi) + ei where f(xi) is the noise-free value of the target function and ei is a random
variable representing the noise.
❖ The task of the learner is to output a maximum likelihood hypothesis, or, equivalently, a MAP hypothesis
assuming all hypotheses are equally probable a priori.

Ti = f(Xi)

Ti = F(Xi) + ei

19
Maximum Likelihood and Least-squared Error Hypotheses . . .

❖ In the case of continuous variables we cannot achieve this by assigning a finite probability to each of the infinite
set of possible values for the random variable. Instead, we speak of a probability density for continuous
variables such as e and require that the integral of this probability density over all possible values be one.
❖ Lower case p is used to refer to the probability density function, to distinguish it from a finite probability P.

20
Maximum Likelihood and Least-squared Error Hypotheses . . .

❖ Given that the noise ei obeys a Normal distribution with zero mean µ, each ti must also obey Normal distribution
centered around true target value f(xi). Hence µ = f(Xi) = h(Xi)

21
Maximum Likelihood and Least-squared Error Hypotheses . . .

22
Maximum Likelihood and Least-squared Error Hypotheses . . .

23
Minimum Description Length Principle
❖ Recall Occam's razor, a popular inductive bias that can be summarized as "choose the shortest explanation for
the observed data."

24
Minimum Description Length Principle . . .

25
Minimum Description Length Principle . . .
❖ The Minimum Description Length (MDL) principle recommends choosing the hypothesis that minimizes the sum
of these two description lengths.
❖ Assuming we use the codes C1 and C2 to represent the hypothesis and the data given the hypothesis, we can
state the MDL principle as:

26
Naïve Bayesian Classifier

27
Naïve Bayesian Classifier . . .

28
Naïve Bayesian Classifier . . .

29
Naïve Bayesian Classifier . . .

30
Naïve Bayesian Classifier . . .

31
Naïve Bayesian Classifier – Example

32
Naïve Bayesian Classifier – Example . . .

33
Naïve Bayesian Classifier – Example . . .

34
Case Study: Learning to Classify Text using Naïve Bayes

Content of Document Class

This is my favourite book Like
This is my favourite novel Like
I like fictions Like
I like novel Like
I don’t like this book Dislike
This is not my favourite novel Dislike
I don’t like novel Dislike
New Document to Classify: “my favourite book”

35
Naïve Bayes Algorithm for Learning and Classifying Text

36
Naïve Bayes Algorithm for Learning and Classifying Text . . .

37
Optimal Bayes Classifier

38
Optimal Bayes Classifier . . .

39
Optimal Bayes Classifier . . .

40
Optimal Bayes Classifier . . .

Any system that classifies new instances according to above equation is called a Bayes optimal classifier, or
Bayes optimal learner. No other classification method using the same hypothesis space and same prior knowledge
can outperform this method on average. This method maximizes the probability that the new instance is classified
correctly, given the available data, hypothesis space, and prior probabilities over the hypotheses.

41
Gibb’s Algorithm
❖ Although the Bayes optimal classifier obtains the best performance that can be achieved from the given training
data, it can be quite costly to apply. The expense is due to the fact that it computes the posterior probability for
every hypothesis in H and then combines the predictions of each hypothesis to classify each new instance.

❖ An alternative, less optimal method -Gibbs algorithm

• Choose a hypothesis h from H at random, according to the posterior probability distribution over H.
• Use h to predict the classification of the next instance x.

❖ Under certain conditions, Classifying the next instance according to a hypothesis drawn at random from the
current version space (according to a uniform distribution), will have expected error at most twice that of the
Bayes optimal classifier.

𝑬 𝒆𝒓𝒓𝒐𝒓𝑮𝒊𝒃𝒃𝒔 ≤ 𝟐 𝑬(𝒆𝒓𝒓𝒐𝒓𝑶𝒑𝒕𝒊𝒎𝒂𝒍𝑩𝒂𝒚𝒆𝒔 )

42
Bayes Classification Methods - Summary

43
Bayesian Belief Networks
❖ Naive Bayes classifier, assumes conditional independence between variables given the value of the target
variable. This assumption dramatically reduces the complexity of learning the target function. However, in many
cases this conditional independence assumption is clearly overly restrictive.
❖ A Bayesian belief network describes the probability distribution governing a set of variables by specifying a set
of conditional independence assumptions along with a set of conditional probabilities.
❖ Bayesian belief networks allow stating conditional independence assumptions that apply to subsets of the
variables. Thus, Bayesian belief networks provide an intermediate approach that is less constraining than the
global assumption of conditional independence made by the naive Bayes classifier, but more tractable than
avoiding conditional independence assumptions altogether.
❖ In general, a Bayesian belief network describes the probability distribution over a set of variables.

44
Bayesian Belief Networks – Conditional Independence
❖ The Naive Bayes classifier assumes that the instance attribute A1 is conditionally independent of instance
attribute A2 given the target value V. This allows the naive Bayes classifier to calculate P(Al, A2 | V) as follows:
𝑷 𝑨𝟏, 𝑨𝟐 𝑽) = 𝑷 𝑨𝟏 𝑽) 𝑷 𝑨𝟐 𝑽)
(product rule of probability - A1 is conditionally independent of A2 given V)
❖ Let X, Y, and Z be three discrete-valued random variables. It can be said that X is conditionally independent of Y
given Z if the probability distribution governing X is independent of the value of Y given a value for Z; that is, if

❖ This definition of conditional independence can be extended to sets of variables as well. It can be said that the
set of variables X1 . . . Xl is conditionally independent of the set of variables Yl . . . Ym given the set of
variables Z1 . . . Zn, if

45
Bayesian Belief Networks – Representation
❖ A Bayesian belief network (Bayesian network for short) represents the joint probability distribution for a set of
variables by specifying a set of conditional independence assumptions (represented by a directed acyclic graph),
together with sets of local conditional probabilities.

✓ Predecessors
✓ Immediate Predecessors
✓ Descendants
✓ Nondescendants
✓ variable is conditionally independent
of its nondescendants in the network
given its immediate predecessors in
the network.

46
Bayesian Belief Networks – Representation . . .

47
Bayesian Belief Networks – Representation . . .
❖ A conditional probability table is given for each variable, describing the probability distribution for that variable
given the values of its immediate predecessors.
❖ The joint probability for any desired assignment of values (y1, . . . yn) to the tuple of network variables
(Y1 . . . Yn) can be computed by the formula:

where Parents(Yi) denotes the set of immediate predecessors of Yi in the network.

48
Bayesian Belief Networks – Example
Causal relations are captured by Bayesian Belief Networks

49
Bayesian Belief Networks – Problem

50
Bayesian Belief Networks – Problem . . .

51
Bayesian Belief Networks – Problem . . .

= 0.00062
52
Bayesian Belief Networks – Problem . . .

53
Learning in Bayesian Belief Networks
❖ Scenario 1: Given both the network structure and all variables observable: compute only
the CPT entries.
❖ Scenario 2: Network structure known, some variables hidden: gradient descent (greedy
hill-climbing) method, i.e., search for a solution along the steepest descent of a criterion
function.
• Weights are initialized to random probability values.
• At each iteration, it moves towards what appears to be the best solution at the
moment, without backtracking.
• Weights are updated at each iteration & converge to local optimum.
❖ Scenario 3: Network structure unknown, all variables observable: search through the
model space to reconstruct network topology .
❖ Scenario 4: Unknown structure, all hidden variables: No good algorithms known for this
purpose.

Naive Bayes
No ratings yet
Naive Bayes
60 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
AI and ML Lab - VIVA Questions
100% (5)
AI and ML Lab - VIVA Questions
7 pages
Module4 Notes
100% (1)
Module4 Notes
31 pages
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
100% (1)
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
672 pages
Bai601 NLP
No ratings yet
Bai601 NLP
5 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
EISystems Report On Email Spam Detection (SRI HARI)
100% (1)
EISystems Report On Email Spam Detection (SRI HARI)
27 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Bayes Theorem
No ratings yet
Bayes Theorem
13 pages
Unit 6 Neural Network Part 2 2
No ratings yet
Unit 6 Neural Network Part 2 2
27 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
AI&ML-Q With Answer
No ratings yet
AI&ML-Q With Answer
18 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Bcs602 ML Mod-4 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-4 Notes @vtunetwork
31 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Unit III
No ratings yet
Unit III
19 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
Module 5
No ratings yet
Module 5
30 pages
Bayesian
No ratings yet
Bayesian
91 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
Module 4
No ratings yet
Module 4
15 pages
Unit 3
No ratings yet
Unit 3
157 pages
Unit 4
No ratings yet
Unit 4
24 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
Module - 5 - Notes BAYESIAN Learning Notes
No ratings yet
Module - 5 - Notes BAYESIAN Learning Notes
24 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
25 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Module 5
No ratings yet
Module 5
24 pages
Unit 4
No ratings yet
Unit 4
18 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
Mod 4
No ratings yet
Mod 4
26 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
TSP 4.0 Capstone Project Student Report GK
No ratings yet
TSP 4.0 Capstone Project Student Report GK
47 pages
3.1 New
No ratings yet
3.1 New
12 pages
Unit - I - IoT & IIoT
No ratings yet
Unit - I - IoT & IIoT
8 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Unit 3 - Ann
No ratings yet
Unit 3 - Ann
49 pages
Naive Bayes Classifier
100% (1)
Naive Bayes Classifier
4 pages
Module - 4 QB Solved-1
No ratings yet
Module - 4 QB Solved-1
31 pages
Lec04 BayesianLearning
No ratings yet
Lec04 BayesianLearning
39 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
17 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
Abstract Book-ICMSc 2021
No ratings yet
Abstract Book-ICMSc 2021
127 pages
Computer Vision Introduction
No ratings yet
Computer Vision Introduction
42 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
20 pages
Disaster Response Classification Using NLP: Under Supervision of - Mrs. Sonali Mathur
No ratings yet
Disaster Response Classification Using NLP: Under Supervision of - Mrs. Sonali Mathur
14 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Machine Learning Record
No ratings yet
Machine Learning Record
52 pages
Unit 5
No ratings yet
Unit 5
18 pages
Ansh Rohatgi 20csu169 AI ML WorkTag
No ratings yet
Ansh Rohatgi 20csu169 AI ML WorkTag
68 pages
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
8 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
cs188 Fa24 hw9
No ratings yet
cs188 Fa24 hw9
7 pages
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
2 - An Integrated Decision Analytic Framework of Machine Learning With Multi-Criteria Decision Making For Multi-Attribute Inventory Classification
No ratings yet
2 - An Integrated Decision Analytic Framework of Machine Learning With Multi-Criteria Decision Making For Multi-Attribute Inventory Classification
43 pages
Stanford University, Structural Health Monitoring in Extreme Events From Machine Learning Perspective
No ratings yet
Stanford University, Structural Health Monitoring in Extreme Events From Machine Learning Perspective
5 pages
Class Result Prediction Using Machine Learning
No ratings yet
Class Result Prediction Using Machine Learning
6 pages
CSE 422 Machine Learning Probabilistic Methods
No ratings yet
CSE 422 Machine Learning Probabilistic Methods
28 pages
6th Sem Project PDF
No ratings yet
6th Sem Project PDF
18 pages
Credit
No ratings yet
Credit
6 pages
Internship Project On Fraud Detection
No ratings yet
Internship Project On Fraud Detection
17 pages
007 Z-Score - Text-Classification - TF-IDF - Unlocked
No ratings yet
007 Z-Score - Text-Classification - TF-IDF - Unlocked
6 pages
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
9 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Horn Clause: Fundamentals and Applications
From Everand
Horn Clause: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bayesian Inference: Fundamentals and Applications
From Everand
Bayesian Inference: Fundamentals and Applications
Fouad Sabry
No ratings yet

UNIT 4 - Bayesian Learning

Uploaded by

UNIT 4 - Bayesian Learning

Uploaded by

Machine Learning

Department of Computer Science and Engineering

Sree Sainath Nagar, A. Rangampet, Tirupati – 517 102

P( + | h) = 0.98 P(h) = 0.008 P( h | + ) = 0.98 x 0.008 = 0.00784

P( + | -h) = 0.03 P(-h) = 0.992 P ( -h | - ) = 0.03 x 0.992 = 0.02976

𝑷 𝑫 𝒉) = 𝟏 𝒊𝒇 𝒕𝒊 = 𝒉 𝑿𝒊 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒔𝒂𝒎𝒑𝒍𝒆𝒔 𝑫𝒊 𝒊𝒏 𝑫

In other words, the probability of data D given hypothesis h is

Case 1) Consider that h is inconsistent with the training data D. Then P( D| h ) = 0

Content of Document Class

❖ An alternative, less optimal method -Gibbs algorithm

where Parents(Yi) denotes the set of immediate predecessors of Yi in the network.

You might also like