0% found this document useful (0 votes)

20 views18 pages

ML - Unit 1 - Part Ii

Bayesian reasoning provides a probabilistic approach to inference based on probability distributions and optimal decisions. Bayesian learning methods calculate hypothesis probabilities and allow prior knowledge combination with observed data. Practical difficulties include estimating unknown probabilities and computational costs, but specialized situations can reduce costs. Naive Bayes classification predicts targets by combining attribute-value predictions weighted by class probabilities.

Uploaded by

devipriya konda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views18 pages

ML - Unit 1 - Part Ii

Uploaded by

devipriya konda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine

BAYESIAN LEARNING

Bayesian reasoning provides a probabilistic approach to inference. It is based on the

assumption that the quantities of interest are governed by probability distributions and that
optimal decisions can be made by reasoning about these probabilities together with observed
data

INTRODUCTION

Bayesian learning methods are relevant to study of machine learning for two different reasons.
1. First, Bayesian learning algorithms that calculate explicit probabilities for hypotheses,
such as the naive Bayes classifier, are among the most practical approaches to certain
types of learningproblems
2. The second reason is that they provide a useful perspective for understanding many
learning algorithms that do not explicitly manipulateprobabilities.

Features of Bayesian Learning Methods

 Each observed training example can incrementally decrease or increase the estimated
probability that a hypothesis is correct. This provides a more flexible approach to
learning than algorithms that completely eliminate a hypothesis if it is found to be
inconsistent with any single example
 Prior knowledge can be combined with observed data to determine the final
probability of a hypothesis. In Bayesian learning, prior knowledge is provided by
asserting (1) a prior probability for each candidate hypothesis, and (2) a probability
distribution over observed data for each possible hypothesis.
 Bayesian methods can accommodate hypotheses that make probabilistic predictions
 New instances can be classified by combining the predictions of multiple hypotheses,
weighted by their probabilities.
 Even in cases where Bayesian methods prove computationally intractable, they can
provide a standard of optimal decision making against which other practical methods
can be measured.

1
Machine

Practical difficulty in applying Bayesian methods

1. One practical difficulty in applying Bayesian methods is that they typically require
initial knowledge of many probabilities. When these probabilities are not known in
advancetheyareoftenestimatedbasedonbackgroundknowledge,previously available
data, and assumptions about the form of the underlying distributions.
2. A second practical difficulty is the significant computational cost required to
determine the Bayes optimal hypothesis in the general case. In certain specialized
situations, this computational cost can be significantly reduced.

TOPIC:8 - BAYES THEOREM

Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior
probability, the probabilities of observing various data given the hypothesis, and the observed
data itself.
Notations
 P(h) prior probability of h, reflects any background knowledge about the chance that h
iscorrect
 P(D) prior probability of D, probability that D will be observed
 P(D|h) probability of observing D given a world in which h holds
 P(h|D) posterior probability of h, reflects confidence that h holds after D has been
observed

Bayes theorem is the cornerstone of Bayesian learning methods because it provides a way to
calculate the posterior probability P(h|D), from the prior probability P(h), together with P(D)
and P(D|h).

 P(h|D) increases with P(h) and with P(D|h) according to Bayes theorem.
 P(h|D) decreases as P(D) increases, because the more probable it is that D will be
observed independent of h, the less evidence D provides in support ofh.

2
Machine

Maximum a Posteriori (MAP) Hypothesis

 In many learning scenarios, the learner considers some set of candidate hypotheses H
and is interested in finding the most probable hypothesis h ∈ H given the observed
data
D. Any such maximally probable hypothesis is called a maximum a posteriori (MAP)
hypothesis.
 BayestheoremtocalculatetheposteriorprobabilityofeachcandidatehypothesisishMAP is
a MAP hypothesis provided

 P(D) can be dropped, because it is a constant independent of h

Maximum Likelihood (ML) Hypothesis

 In some cases, it is assumed that every hypothesis in H is equally probable apriori

(P(hi) = P(hj) for all hi and hj inH).
 In this case the below equation can be simplified and need only consider the term P(D|
h) to find the most probable hypothesis.

P(D|h) is often called the likelihood of the data D given h, and any hypothesis that maximizes P(D|
h) is called a maximum likelihood (ML)hypothesis

Example
 Consideramedicaldiagnosisprobleminwhichtherearetwoalternativehypotheses:
(1) that the patient has particular form of cancer, and (2) that the patient does not. The
available data is from a particular laboratory test with two possible outcomes: +
(positive) and - (negative).

3
Machine

 We have prior knowledge that over the entire population of people only .008 have this
disease. Furthermore, the lab test is only an imperfect indicator of thedisease.
 Thetestreturnsacorrectpositiveresultinonly98%ofthecasesinwhichthediseaseis actually
present and a correct negative result in only 97% of the cases in which the disease is
not present. In other cases, the test returns the oppositeresult.
 The above situation can be summarized by the followingprobabilities:

Suppose a new patient is observed for whom the lab test returns a positive (+) result.
Should we diagnose the patient as having cancer or not?

The exact posterior probabilities can also be determined by normalizing the above quantities
so that they sum to 1

Basic formulas for calculating probabilities are summarized in Table

4
Machine

Topic: 9 - MAXIMUM LIKELIHOOD AND LEAST-SQUARED

ERROR HYPOTHESES

Consider the problem of learning a continuous-valued target function such as neural network
learning, linear regression, and polynomial curve fitting

A straightforward Bayesian analysis will show that under certain assumptions any learning
algorithm that minimizes the squared error between the output hypothesis predictions and the
training data will output a maximum likelihood (ML) hypothesis

 Learner L considers an instance space X and a hypothesis space H consisting of some

class of real-valued functions defined over X, i.e., (∀ h ∈ H)[ h : X → R] and training
examples of the form<xi,di>
 The problem faced by L is to learn an unknown target function f : X →R
 A set of m training examples is provided, where the target value of each example is
corrupted by random noise drawn according to a Normal probability distribution with
zero mean (di = f(xi) +ei)
 Each training example is a pair of the form (xi ,di ) where di = f (xi ) + ei.
–
Heref(xi)isthenoise-freevalueofthetargetfunctionandeiisarandomvariable
representing thenoise.
–
It is assumed that the values of the ei are drawn independently and that they are
distributed according to a Normal distribution with zero mean.
 The task of the learner is to output a maximum likelihood hypothesis or aMAP
hypothesis assuming all hypotheses are equally probable apriori.

Using the definition of hML we have

Assuming training examples are mutually independent given h, we can write P(D|h) as the
product of the various (di|h)

GiventhenoiseeiobeysaNormaldistributionwithzeromeanandunknownvarianceσ2,each di must
also obey a Normal distribution around the true targetvalue f(xi). Because we are writing the
expression for P(D|h), we assume h is the correct description off.
Hence, µ = f(xi) = h(xi)

5
Machine

Maximize the less complicated logarithm, which is justified because of the monotonicity of
function p

The first term in this expression is a constant independent of h, and can therefore be
discarded, yielding

Maximizing this negative quantity is equival235

.ent to minimizing the corresponding positive quantity

Finally, discard constants that are independent of h.

Thus, above equation shows that the maximum likelihood hypothesis hML is the one that
minimizes the sum of the squared errors between the observed training values di and the
hypothesis predictions h(xi)

Note:
Why is it reasonable to choose the Normal distribution to characterize noise?
 Good approximation of many types of noise in physicalsystems
 Central Limit Theorem shows that the sum of a sufficiently large number of
independent,identicallydistributedrandomvariablesitselfobeysaNormaldistribution

6
Machine

Only noise in the target value is considered, not in the attributes describing the instances
themselves

7
Machine

Topic: 10 - NAIVE BAYES CLASSIFIER:

 The naive Bayes classifier applies to learning tasks where each instance x is described
by a conjunction of attribute values and where the target function f (x) can take on any
value from some finite setV.
 A set of training examples of the target function is provided, and a new instance is
presented, described by the tuple of attribute values (al, a2...am).
 The learner is asked to predict the target value, or classification, for this new instance.

The Bayesian approach to classifying the new instance is to assign the most probable target
value, VMAP, given the attribute values (al, a2.. .am) that describe the instance

Use Bayes theorem to rewrite this expression as

 The naive Bayes classifier is based on the assumption that the attribute values are
conditionally independent given the target value. Means, the assumption is that given
thetargetvalueoftheinstance,theprobabilityofobservingtheconjunction(al,a2...am), is
just the product of the probabilities for the individual attributes:

Substituting this into Equation (1),

Naive Bayes classifier:

Where, VNB denotes the target value output by the naive Bayes classifier

8
Machine

An Illustrative Example
 Let us apply the naive Bayes classifier to a concept learning problem i.e., classifying
days according to whether someone will play tennis.
 Thebelowtableprovidesasetof14trainingexamplesofthetargetconceptPlayTennis, where
each day is described by the attributes Outlook, Temperature, Humidity, and Wind

Day Outlook Temperature Humidity Wind Play

Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

 Use the naive Bayes classifier and the training data from this table to classify the
following novel instance:
< Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong >

 Our task is to predict the target value (yes or no) of the target concept Play Tennis
for this new instance

9
Machine

The probabilities of the different target values can easily be estimated based on their
frequencies over the 14 training examples
 P(P1ayTennis = yes) = 9/14 =0.64
 P(P1ayTennis = no) = 5/14 =0.36

Similarly, estimate the conditional probabilities. For example, those for Wind = strong
 P(Wind = strong | Play Tennis = yes) = 3/9 =0.33
 P(Wind = strong | Play Tennis = no) = 3/5 =

0.60 Calculate VNB according to Equation(1)

Thus, the naive Bayes classifier assigns the target value Play Tennis = no to this
new instance, based on the probability estimates learned from the training data.

By normalizing the above quantities to sum to one, calculate the conditional probability that
the target value is no, given the observed attribute values

Estimating Probabilities

 We have estimated probabilities by the fraction of times the event is observed to occur
over the total number of opportunities.
 For example, in the above case we estimated P(Wind = strong | Play Tennis = no)
bythe fraction nc /n where, n = 5 is the total number of training examples for which
Play Tennis = no, and nc = 3 is the number of these for which Wind =strong.
 ToavoidthisdifficultywecanadoptaBayesianapproachtoestimatingtheprobability, using
the m-estimate defined as follows
m -estimate of probability:

 p is our prior estimate of the probability we wish to determine, and m is a constant called the
equivalent sample size, which determines how heavily to weight p relative to the observed
data
 Method for choosing p in the absence of other information is to assume uniform
priors; that is, if an attribute has k possible values we set p = 1/k.

1
Machine

Topic: 11 - BAYESIAN BELIEF NETWORKS

 ThenaiveBayesclassifiermakessignificantuseoftheassumptionthatthevaluesofthe
attributes a1 . . .an are conditionally independent given the target value v.
 This assumption dramatically reduces the complexity of learning the target function

A Bayesian belief network describes the probability distribution governing a set of variables
by specifying a set of conditional independence assumptions along with a set of conditional
probabilities
Bayesian belief networks allow stating conditional independence assumptions that apply to
subsets of the variables

Notation
 Consider an arbitrary set of random variables Y1 . . . Yn , where each variable Yi can
take on the set of possible values(Yi).
 The joint space of the set of variables Y to be the cross product V(Y1) x V(Y2) x. . .
V(Yn).
 In other words, each item in the joint space corresponds to one of the possible
assignments of values to the tuple of variables (Y1 . . . Yn). The probability
distribution over this joint' space is called the joint probability distribution.
 The joint probability distribution specifies the probability for each of the possible
variable bindings for the tuple (Y1 . . .Yn).
 A Bayesian belief network describes the joint probability distribution for a set of
variables.

Conditional Independence

Let X, Y, and Z be three discrete-valued random variables. X is conditionally independent of

Y given Z if the probability distribution governing X is independent of the value of Y given a
value for Z, that is, if

Where,

1
Machine

The above expression is written in abbreviated form as

P(X | Y, Z) = P(X | Z)

Conditional independence can be extended to sets of variables. The set of variables X1 . . . X l

is conditionally independent of the set of variables Y1 . . . Ym given the set of variables Z1 . .
. Zn if

The naive Bayes classifier assumes that the instance attribute A1 is conditionally independent
of instance attribute A2 given the target value V. This allows the naive Bayes classifier to
calculate P(Al, A2 | V) as follows,

Representation

A Bayesian belief network represents the joint probability distribution for a set of variables.
Bayesian networks (BN) are represented by directed acyclic graphs.

The Bayesian network in above figure represents the joint probability distribution over the
boolean variables Storm, Lightning, Thunder, ForestFire, Campfire, and BusTourGroup

A Bayesian network (BN) represents the joint probability distribution by specifying a set of
conditional independence assumptions
 BN represented by a directed acyclic graph, together with sets of local conditional
probabilities
 Each variable in the joint space is represented by a node in the Bayesian network
 The network arcs represent the assertion that the variable is conditionally independent
of its non-descendants in the network given its immediate predecessors in the network.
 A conditional probability table (CPT) is given for each variable, describing the
probability distribution for that variable given the values of its immediate
predecessors

1
Machine

The joint probability for any desired assignment of values (y1, . . . , yn) to the tuple of
network variables (Y1 . . . Ym) can be computed by the formula

Where, Parents(Yi) denotes the set of immediate predecessors of Yi in the network.

Example:
ConsiderthenodeCampfire.ThenetworknodesandarcsrepresenttheassertionthatCampfire is
conditionally independent of its non-descendants Lightning and Thunder, given its
immediate parents Storm and BusTourGroup.

This means that once we know the value of the variables Storm and BusTourGroup, the
variables Lightning and Thunder provide no additional information about Campfire
The conditional probability table associated with the variable Campfire. The assertion is

P(Campfire = True | Storm = True, BusTourGroup = True) = 0.4

Inference

 Use a Bayesian network to in fer the value of some target variable (e.g.,Forest
Fire)given the observed values of the other variables.
 Inference can be straightforward if values for all of the other variables in the network
are known exactly.
 A Bayesian network can be used to compute the probability distribution for any subset
of network variables given the values or distributions for any subset of the remaining
variables.
 An arbitrary Bayesian network is known to be NP-hard

1
Machine

Learning Bayesian Belief Networks

Affective algorithms can be considered for learning Bayesian belief networks from training
data by considering several different settings for learning problem
 First,thenetworkstructuremightbegiveninadvance,oritmighthavetobeinferredfrom the
training data.
 Second, all the network variables might be directly observable in each training example,
or some might be unobservable.
 In the case where the network structure is given in advance and the variables are fully
observable in the training examples, learning the conditional probability tables is
straightforward and estimate the conditional probability tableentries
 In the case where the network structure is given but only some of the variable values
areobservableinthetrainingdata,thelearningproblemismoredifficult.Thelearning
problem can be compared to learning weights for anANN.

Gradient Ascent Training of Bayesian Network

The gradient ascent rule which maximizes P(D|h) by following the gradient of ln P(D|h) with
respecttotheparametersthatdefinetheconditionalprobabilitytablesoftheBayesiannetwork.

Let wijk denote a single entry in one of the conditional probability tables. In particular wijk
denote the conditional probability that the network variable Yi will take on the value yi, given
that its immediate parents Ui take on the values given by uik.

The gradient of ln P(D|h) is given by the derivatives for each of the

wijk. As shown below, each of these derivatives can be calculatedas

Derive the gradient defined by the setofderivatives for all i, j, and k. Assuming the
training examples d in the data set D are drawn independently, we write this derivativeas

1
Machine

We write the abbreviation Ph(D) to represent P(D|h).

1
Machine

Topic 12: Bayesian Optimization

Bayesian Optimization is an approach that uses Bayes Theorem to direct the search in order to find
the minimum or maximum of an objective function.
It is an approach that is most useful for objective functions that are complex, noisy, and/or
expensive to evaluate.

Bayesian optimization is a powerful strategy for finding the extreme a of objective functions that
are expensive to evaluate. […] It is particularly useful when these evaluations are costly, when one
does not have access to derivatives, or when the problem at hand is non-convex.

Recall that Bayes Theorem is an approach for calculating the conditional probability of an event:

 P(A|B) = P(B|A) * P(A) / P(B)

We can simplify this calculation by removing the normalizing value of P(B) and describe the
conditional probability as a proportional quantity. This is useful as we are not interested in
calculating a specific conditional probability, but instead in optimizing a quantity.
 P(A|B) = P(B|A) * P(A)
The conditional probability that we are calculating is referred to generally as
the posterior probability; the reverse conditional probability is sometimes referred to as the
likelihood, and the marginal probability is referred to as the prior probability.

1
Machine

This provides a framework that can be used to quantify the beliefs about an unknown objective
function given samples from the domain and their evaluation via the objective function.

We can devise specific samples (x1, x2, …, xn) and evaluate them using the objective
function f(xi) that returns the cost or outcome for the sample xi. Samples and their outcome are
collected sequentially and define our data D, e.g. D = {xi, f(xi), … xn, f(xn)} and is used to define
the prior. The likelihood function is defined as the probability of observing the data given the
function P(D | f). This likelihood function will change as more observations are collected.
 P(f|D) = P(D|f) * P(f)
The posterior represents everything we know about the objective function. It is an approximation of
the objective function and can be used to estimate the cost of different candidate samples that we
may want to evaluate.

In this way, the posterior probability is a surrogate objective function.

How to Perform Bayesian Optimization

In this section, we will explore how Bayesian Optimization works by developing an implementation
from scratch for a simple one-dimensional test function.

First, we will define the test problem, then how to model the mapping of inputs to outputs with a
surrogate function. Next, we will see how the surrogate function can be searched efficiently with an
acquisition function before tying all of these elements together into the Bayesian Optimization
procedure.

Test Problem

The first step is to define a test problem.

We will use a multimodal problem with five peaks, calculated as:

 y = x^2 * sin(5 * PI * x)^6

Where x is a real value in the range [0,1] and PI is the value of pi.
We will augment this function by adding Gaussian noise with a mean of zero and a standard
deviation of 0.1. This will mean that the real evaluation will have a positive or negative random
value added to it, making the function challenging to optimize.

1
Machine

Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Probability and Random Number A First Guide To Randomness
No ratings yet
Probability and Random Number A First Guide To Randomness
136 pages
Unit 4
No ratings yet
Unit 4
24 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
Bcs602 ML Mod-4 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-4 Notes @vtunetwork
31 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
3.1 New
No ratings yet
3.1 New
12 pages
Module - 5 - Notes BAYESIAN Learning Notes
No ratings yet
Module - 5 - Notes BAYESIAN Learning Notes
24 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
UNIT -5 ML
No ratings yet
UNIT -5 ML
57 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
25 pages
Bayesian
No ratings yet
Bayesian
91 pages
Module - 4 QB Solved-1
No ratings yet
Module - 4 QB Solved-1
31 pages
Module 5
No ratings yet
Module 5
24 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Module 5
No ratings yet
Module 5
30 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Mod 4
No ratings yet
Mod 4
26 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
AI&ML-Q With Answer
No ratings yet
AI&ML-Q With Answer
18 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Bayesian Learning Note
No ratings yet
Bayesian Learning Note
20 pages
UNIT 4 - Bayesian Learning
No ratings yet
UNIT 4 - Bayesian Learning
54 pages
Unit III
No ratings yet
Unit III
19 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Unit 3
No ratings yet
Unit 3
157 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
SL09. Bayesian Learning
No ratings yet
SL09. Bayesian Learning
4 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
Unit 4
No ratings yet
Unit 4
18 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Module 4
No ratings yet
Module 4
15 pages
9733233
No ratings yet
9733233
31 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
ML Unit-4.a
No ratings yet
ML Unit-4.a
69 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
Module4 Notes
100% (1)
Module4 Notes
31 pages
Bayesian Inference: Fundamentals and Applications
From Everand
Bayesian Inference: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
PS1 Answer
100% (1)
PS1 Answer
2 pages
Document 1702398025050
No ratings yet
Document 1702398025050
1 page
Discrete Random Variables and Probability Distribution
No ratings yet
Discrete Random Variables and Probability Distribution
40 pages
Quiz
No ratings yet
Quiz
2 pages
MCQ Probability Thoery PDF
No ratings yet
MCQ Probability Thoery PDF
6 pages
Eco 409
No ratings yet
Eco 409
27 pages
Random Effects Models: Yanez, Spring 2004 1 Lecture Notes XI
No ratings yet
Random Effects Models: Yanez, Spring 2004 1 Lecture Notes XI
14 pages
Risk Anlytics - Tutorial - w14+15
No ratings yet
Risk Anlytics - Tutorial - w14+15
33 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
CH 18 Probability
No ratings yet
CH 18 Probability
25 pages
Mount Zion College of Engineering and Technology
No ratings yet
Mount Zion College of Engineering and Technology
22 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
AI II Mid Paper Print 2024-2025 Set 1
No ratings yet
AI II Mid Paper Print 2024-2025 Set 1
3 pages
HAIMLC501 MathematicsForAIML Lecture 14 Distributions SH2022
No ratings yet
HAIMLC501 MathematicsForAIML Lecture 14 Distributions SH2022
46 pages
Heads or Tails?: University of Saint Louis School of Education, Arts and Sciences
No ratings yet
Heads or Tails?: University of Saint Louis School of Education, Arts and Sciences
3 pages
Grade 12 3rd Quarter
No ratings yet
Grade 12 3rd Quarter
4 pages
Chapter 3 - Parametric Families of Univariate Distributions - v2 - PartII
No ratings yet
Chapter 3 - Parametric Families of Univariate Distributions - v2 - PartII
58 pages
The Power To See A New Graphical Test of Normality
No ratings yet
The Power To See A New Graphical Test of Normality
13 pages
2a. Exploratory Data Analysis
No ratings yet
2a. Exploratory Data Analysis
7 pages
Marginal Distribution
No ratings yet
Marginal Distribution
6 pages
4 Comparing+Two+Proportions
No ratings yet
4 Comparing+Two+Proportions
21 pages
Normal and Exponential Distribution: by Abhijeet Salunke
No ratings yet
Normal and Exponential Distribution: by Abhijeet Salunke
21 pages
Skript 2022
No ratings yet
Skript 2022
112 pages
Communications in Statistics - Simulation and Computation: Click For Updates
No ratings yet
Communications in Statistics - Simulation and Computation: Click For Updates
37 pages
Stochastic Process
No ratings yet
Stochastic Process
5 pages
Probability and Statistics
No ratings yet
Probability and Statistics
64 pages
FSS 840. Topic 3 by Prof Femi Saibu
No ratings yet
FSS 840. Topic 3 by Prof Femi Saibu
79 pages
Introduction To Probability
No ratings yet
Introduction To Probability
61 pages
Skewness, Moments and Kurtosis-1
No ratings yet
Skewness, Moments and Kurtosis-1
3 pages

ML - Unit 1 - Part Ii

Uploaded by

ML - Unit 1 - Part Ii

Uploaded by

Machine

Bayesian reasoning provides a probabilistic approach to inference. It is based on the

Features of Bayesian Learning Methods

Practical difficulty in applying Bayesian methods

TOPIC:8 - BAYES THEOREM

Maximum a Posteriori (MAP) Hypothesis

 P(D) can be dropped, because it is a constant independent of h

Maximum Likelihood (ML) Hypothesis

 In some cases, it is assumed that every hypothesis in H is equally probable apriori

Basic formulas for calculating probabilities are summarized in Table

Topic: 9 - MAXIMUM LIKELIHOOD AND LEAST-SQUARED

 Learner L considers an instance space X and a hypothesis space H consisting of some

Using the definition of hML we have

Maximizing this negative quantity is equival235

.ent to minimizing the corresponding positive quantity

Finally, discard constants that are independent of h.

Topic: 10 - NAIVE BAYES CLASSIFIER:

Use Bayes theorem to rewrite this expression as

Substituting this into Equation (1),

Naive Bayes classifier:

Day Outlook Temperature Humidity Wind Play

0.60 Calculate VNB according to Equation(1)

Topic: 11 - BAYESIAN BELIEF NETWORKS

Let X, Y, and Z be three discrete-valued random variables. X is conditionally independent of

The above expression is written in abbreviated form as

Conditional independence can be extended to sets of variables. The set of variables X1 . . . X l

Where, Parents(Yi) denotes the set of immediate predecessors of Yi in the network.

P(Campfire = True | Storm = True, BusTourGroup = True) = 0.4

Learning Bayesian Belief Networks

Gradient Ascent Training of Bayesian Network

The gradient of ln P(D|h) is given by the derivatives for each of the

We write the abbreviation Ph(D) to represent P(D|h).

Topic 12: Bayesian Optimization

 P(A|B) = P(B|A) * P(A) / P(B)

In this way, the posterior probability is a surrogate objective function.

How to Perform Bayesian Optimization

The first step is to define a test problem.

We will use a multimodal problem with five peaks, calculated as:

 y = x^2 * sin(5 * PI * x)^6

You might also like