0% found this document useful (0 votes)

40 views66 pages

Unit 3 Bayesian Concept Learning

Uploaded by

parthpanchal2207

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views66 pages

Unit 3 Bayesian Concept Learning

Uploaded by

parthpanchal2207

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Bayesian Concept Learning

Unit 3

AML-Prof. Minal Chauhan

Introduction
• Principles of probability for classification are an important area of
machine learning algorithms. In our practical life, our decisions are
affected by our prior knowledge or belief about an event.
• Thus, an event that is otherwise very unlikely to occur may be
considered by us seriously to occur in certain situations if we know
that in the past, the event had certainly occurred when other events
were observed.
• The same concept is applied in machine learning using Bayes’
theorem.
Bayes’ Theorem
• Bayesian reasoning is applied to decision making and inferential
statistics that deals with probability inference. It is used the
knowledge of prior events to predict future events.
• Example: Predicting the color of marbles in a basket.
Example
Example
Applications of Bayesian classifiers
• Text-based classification such as spam or junk mail filtering, author
identification, or topic categorization
• Medical diagnosis such as given the presence of a set of observed
symptoms during a disease, identifying the probability of new
patients having the disease
• Network security such as detecting illegal intrusion or anomaly in
computer networks
Can we use probability to classify
• The world is a very uncertain place
• Almost 40 years of AI and ML dealing with uncertain domains
• Some researchers decided to employ ideas from probability to model
concepts
• Before saying more let Before saying more… let s’ go to the beginning
The Thomas Bayes
• Two main works:
• Divine Benevolence or an Attempt to Divine Benevolence, or
an Attempt to Prove That the Principal End of the Divine
Providence and Government is the H i f Hi C t (1731)
Happiness o f His Creatures (1731)
• An Introduction to the Doctrine of Fluxions, and a Defense of
the Mathematicians of the Mathematicians Against the
Objections of the Author of the Analyst (published
anonymously in 1736)
• But we are especially are especially interested in: interested
in: Essay Towards Solving a Problem in the Doctrine of
Chances (1764) which was actually published posthumously
b y Richard Price
Where These Ideas Came From?
• Bayes build his theory upon several ideas
• Immanuel Kant (1724-1804)
• Copernican revolution: Copernican revolution: our understanding our
understanding of the external world had its foundations not merely in
experience, but in both experience and a priori concepts, thus offering a non-
empiricist critique of rationalist philosophy.
• Isaac Newton (1643-1727)
• Universal gravitation
• three laws of motion which dominated the scientific view of the physical
universe for the next three centuries
What Was Bayes’ Point?
• Bayesian probability
• Notion of probability interpreted as partial belief rather than as
frequency
• Bayesian estimation
• Calculate the validity of a proposition
• On the basis of a prior estimate of its probability and new relevant
evidence E.g.:
• Before Bayes, forward probability
• given a specified number of white and black balls in an urn, what is the probability of
drawing a black ball?
• Bayes turned its attention to the converse problem
• given that one or more balls have been drawn, what can be said Slide 9 about the
number of white and black balls in the urn?
Introduction
• Bayesian Decision Theory came long before Version Spaces, Decision Tree
Learning and Neural Networks. It was studied in the field of Statistical Theory and
more specifically, in the field of Pattern Recognition.
• Bayesian Decision Theory is at the basis of important learning schemes such as
the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM
Algorithm.
• He developed the foundational mathematical principles, known as Bayesian
methods, which describe the probability of events, and more importantly, how
probabilities should be revised when there is additional information available.
• Bayesian Decision Theory is also useful as it provides a framework within which
many non-Bayesian classifiers can be studied
Introduction
• Bayesian classifiers use a simple idea that the training data are utilized to
calculate an observed probability of each class based on feature values.
• When the same classifier is used later for unclassified data, it uses the
observed probabilities to predict the most likely class for the new features.
• The application of the observations from the training data can also be
thought of as applying our prior knowledge or prior belief to the
probability of an outcome, so that it has higher probability of meeting the
actual or real-life outcome.
• This simple concept is used in Bayes’ rule and applied for training a
machine in machine learning terms.
Bayes’ Theorem
• Before we learn this, we should be clear about what is concept learning.
• Let us take an example of how a child starts to learn meaning of new
words, e.g. ‘ball’.
• The child is provided with positive examples of ‘objects’ which are ‘ball’. At
first, the child may be confused with many different colours, shapes and
sizes of the balls and may also get confused with some objects which look
similar to ball, like a balloon or a globe.
• The child’s parent continuously feeds her positive examples like ‘that is a
ball’, ‘this is a green ball’, ‘bring me that small ball’, etc.
• Seldom there are negative examples used for such concept teaching, like
‘this is a non-ball’, but the parent may clear the confusion of the child when
it points to a balloon and says it is a ball by saying ‘that is not a ball’.
Bayes’ Theorem
• But it is observed that the learning is most influenced through
positive examples rather than through negative examples, and the
expectation is that the child will be able to identify the object ‘ball’
from a wide variety of objects and different types of balls kept
together once the concept of a ball is clear to her.
• We can extend this example to explain how we can expect machines
to learn through the feeding of positive examples, which forms the
basis for concept learning.
Bayes’ Theorem
• Let us define a concept set C and a corresponding function f(k). We also
define f(k) = 1, when k is within the set C and f(k) = 0 otherwise.
• Goal: Our aim is to learn the indicator function f that defines which
elements are within the set C. So, by using the function f, we will be able to
classify the element either inside or outside our concept set.
• We use standard probability calculus to determine the uncertainty about
the function f.
• Bayes’ probability rule as given as:

• Let us assume that we have a training data set D where we have noted
some observed data. Our task is to determine the best hypothesis in space
H by using the knowledge of D.
Prior (knowledge)
• The prior knowledge or belief about the probabilities of various hypotheses in H
is called Prior in context of Bayes’ theorem.
• For example, if we have to determine whether a particular type of tumour is
malignant for a patient, the prior knowledge of such tumours becoming
malignant can be used to validate our current hypothesis and is a prior
probability or simply called Prior.
• We will assume that P(h) is the initial probability of a hypothesis ‘h’ that the
patient has a malignant tumour based only on the malignancy test, without
considering the prior knowledge of the correctness of the test process
• P(T) is the prior probability that the training data will be observed or, in this case,
the probability of positive malignancy test results.
• We will denote P(T|h) as the probability of observing data T in a space where ‘h’
holds true, which means the probability of the test results showing a positive
value when the tumour is actually malignant.
Posterior
• The probability that a particular hypothesis holds for a data set based on
the Prior is called the posterior probability or simply Posterior.
• In the above example, the probability of the hypothesis that the patient
has a malignant tumour considering the Prior of correctness of the
malignancy test is a posterior probability.
• In our notation, we will say that we are interested in finding out P(h|T),
which means whether the hypothesis holds true given the observed
training data T. This is called the posterior probability or simply Posterior in
machine learning language.
• So, the prior probability P(h), which represents the probability of the
hypothesis independent of the training data (Prior), now gets refined with
the introduction of influence of the training data as P(h|T).
According to Bayes’ theorem
• The below equation combines the prior and posterior probabilities
together.

• we can deduce that P(h|T) increases as P(h) and P(T|h) increases and
also as P(T) decreases.
• The simple explanation is that when there is more probability that T
can occur independently of h then it is less probable that h can get
support from T in its occurrence.
Bayes’ Theorem
• Goal: To determine the most probable hypothesis, given the data T plus
any initial knowledge about the prior probabilities of the various
hypotheses in H.
• Prior probability of h, P(h): it reflects any background knowledge we have
about the chance that h is a correct hypothesis (before having observed
the data).
• Prior probability of T, P(T): it reflects the probability that training data T
will be observed given no knowledge about which hypothesis h holds.
• Conditional Probability of observation T, P(T|h): it denotes the probability
of observing data T given some world in which hypothesis h holds.
Bayes’ Theorem
• Posterior probability of h, P(h|T): it represents the probability that h
holds given the observed training data T. It reflects our confidence
that h holds after we have seen the training data T and it is the
quantity that Machine Learning researchers are interested in.
• Bayes Theorem allows us to compute P(h|T):

P(h|T)=P(T|h)P(h)/P(T)
Maximum A Posteriori (MAP)
Hypothesis and Maximum Likelihood
• Goal: To find the most probable hypothesis h from a set of candidate
hypotheses H given the observed data T. This maximally probable
hypothesis is called the maximum a posteriori (MAP) hypothesis.
• MAP Hypothesis, hMAP = argmax hH P(h|T)
= argmax hH P(T|h)P(h)/P(T)
= argmax hH P(T|h)P(h)
• If every hypothesis in H is equally probable a priori, we only need to
consider the likelihood of the data T given h, P(T|h). Then, hMAP
becomes the Maximum Likelihood,
hML= argmax hH P(T|h)P(h)
Some Results from the Analysis of Learners in
a Bayesian Framework
• If P(h)=1/|H| and if P(T|h)=1 if T is consistent with h, and 0
otherwise, then every hypothesis in the version space resulting from
T is a MAP hypothesis.
• Under certain assumptions regarding noise in the data, minimizing
the mean squared error (what common neural nets do) corresponds
to computing the maximum likelihood hypothesis.
• When using a certain representation for hypotheses, choosing the
smallest hypotheses corresponds to choosing MAP hypotheses (An
attempt at justifying Occam’s razor)
Example
• We will calculate how the prior knowledge of the percentage of cancer
cases in a sample population and probability of the test result being correct
influence the probability outcome of the correct diagnosis.
• We have two alternative hypotheses:
• (1) a particular tumour is of malignant type and
• (2) a particular tumour is non-malignant type.
• The priori available are—
• only 0.5% of the population has this kind of tumour which is malignant,
• the laboratory report has some amount of incorrectness as it could detect the
malignancy was present only with 98% accuracy whereas could show the malignancy
was not present correctly only in 97% of cases.
• This means the test predicted malignancy was present which actually was a false
alarm in 2% of the cases, and also missed detecting the real malignant tumour in 3%
of the cases.
Solution
• Let us denote Malignant Tumour = MT, Positive Lab Test = PT,
Negative Lab Test = NT
• h1 = the particular tumour is of malignant type = MT in our example
• h2 = the particular tumour is not malignant type = !MT in our example
• P(MT) = 0.005 P(!MT) = 0.995
• P(PT|MT) = 0.98 P(PT|!MT) = 0.02
• P(NT|!MT) = 0.97 P(NT|MT) = 0.03
Solution
• for the new patient, if the
laboratory test report shows
positive result, let us see if we
should declare this as the
malignancy case or not?
Solution
• As P(h2 |PT) is higher than P(h1 |PT), it is clear that the hypothesis h2
has more probability of being true. So, hMAP = h2 = !MT.
• This indicates that even if the posterior probability of malignancy is
significantly higher than that of nonmalignancy, the probability of this
patient not having malignancy is still higher on the basis of the prior
knowledge.
Naïve Bayesian Classification
• It is based on the Bayesian theorem It is particularly suited when the
dimensionality of the inputs is high. Parameter estimation for naive
Bayes models uses the method of maximum likelihood. In spite over-
simplified assumptions, it often performs better in many complex real
world situations.
• Advantage: Requires a small amount of training data to estimate the
parameters
Naïve Bayesian Classification
• Derivation:
• D : Set of tuples
• ** Each Tuple is an n dimensional attribute vector
• ** X : (x1,x2,x3,…. xn)
• Let there be m Classes : C1,C2,C3…Cm
• Naïve Bayes classifier predicts X belongs to Class Ci iff
• **P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i
• Maximum Posteriori Hypothesis
• **P(Ci/X) = P(X/Ci) P(Ci) / P(X)
• **Maximize P(X/Ci) P(Ci) as P(X) is constant
Naïve Bayesian Classification
• Bayes classification
P(C |X )  P( X |C )P(C ) = P( X1 ,  , Xn |C )P(C )
Difficulty: learning the joint probabilityP( X1 ,  , Xn |C )

• Naïve Bayes classification

– [ PMAP
( x1 |c *classification
)    P( xn |c * )]Prule
(c * )  [ P( x1 |c )    P( xn |c )]P(c), c  c * , c = c1 ,  , c L
29
Naïve Bayesian Classification
• As combined probability of the attributes defining the new
instance fully is always 1

• So, to get the most probable classifier, we have to evaluate

the two terms P(a , a , c, a |c ) and P(ci). In a practical
scenario, it is possible to calculate P(ci) by calculating the
frequency of each target value c in the training data set.
But the P(a , a , c, a |c ) cannot be estimated easily and
needs a very high effort of calculation. T

30
• Naïve Bayes classifier makes a simple assumption that the attribute
values are conditionally independent of each other for the target
value. So, applying this simplification, we can now say that for a target
value of an instance, the probability of observing the combination a1
,a2 ,…, an is the product of probabilities of individual attributes
P(ai |cj ).

• we get the approach for the Naïve Bayes classifier as

• Here, we will be able to compute P(ai |cj ). as we have to calculate this

only for the number of distinct attributes values (ai ) times the
number of distinct target values (cj ), which is much smaller set than
the product of both the sets.
Example
• Example: Play Tennis

32
Example
• Learning Phase- conditional probabilities:
Outlook Play=Yes Play=No Temperatur Play=Yes Play=No
Sunny e
2/9 3/5
Hot 2/9 2/5
Overcast 4/9 0/5
Mild 4/9 2/5
Rain 3/9 2/5
Cool 3/9 1/5
Humidity Play=Ye Play=N
s o Wind Play=Yes Play=No
High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

Prior probability P(Play=Yes) = 9/14 P(Play=No) = 5/14

33
Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

– MAP rule

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) =
0.0206
35
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Previous example
• X = ( age= youth, income = medium, student = yes, credit_rating =
fair)
• A person belonging to tuple X will buy a computer?
Previous example
Naïve Bayes algorithm
• Steps to implement:
1. Data Pre-processing step
2. Fitting Naive Bayes to the Training set
3. Predicting the test result
4. Test accuracy of the result(Creation of Confusion matrix)
5. Predict class for unknown data
1)Data Pre-processing step
Importing the libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing the dataset , Selecting data by row numbers (.iloc)
dataset = pd.read_csv('user_data.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
2) Fitting Naive Bayes to the Training Set
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
Classifier.score(x_test, y_test)
• we have used the GaussianNB classifier to fit it to the training
dataset. We can also use other classifiers as per our requirement.
(Multinominal /Bernoulli)
3) Prediction of the test set result:
# Predicting the Test set results
y_pred = classifier.predict(x_test)
4) Creating Confusion Matrix:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
predicted

• actual

• 7+3= 10 incorrect predictions, and 65+25=90 correct predictions.

5)Predict class for unknown data
#userID,age, estimated salary
test = [[ 15768443, 25, 33000 ]]
#test=x_test
a=classifier.predict(test)
print(a)
0
Brute-force Bayesian algorithm
• Brute force MAP concept learning:
• Calculate the posterior probability of each hypothesis h in H:

• Identify the h with the highest posterior probability

hMAP = argmax hH P(h|T)
Brute force MAP concept learning:
• Calculating P(h|T) the posterior probability for each hypothesis
requires a very high volume of computation, and for a large volume of
hypothesis space H.
• While it is impractical for large hypothesis spaces
• The algorithm is still of interest because it provides a standard
solution against which we may judge a performance of other concept
learning algorithms.
• This algorithm says that
Brute force MAP concept learning:
• Here H is the number of hypotheses from the space H which are
consistent with target data set T.
• The interpretation of this evaluation is that initially, each hypothesis
has equal probability.
• As we introduce the training data, the posterior probability of
inconsistent hypotheses becomes zero.
• The total probability that sums up to 1 is distributed equally among
the consistent hypotheses in the set.
Consistent learners
• consistent learner: a learning algorithm that outputs a hypothesis
that commits zero errors over the training examples.
• Every consistent learner outputs a MAP hypothesis if we assume:
• Uniform prior probability distribution over H deterministic,
• Noise-free training data
• Example: Find-S outputs the maximally specific consistent hypothesis,
which is a MAP hypothesis.
Concept Learning
• Concept learning can be viewed as the task of searching
through a large space of hypothesis implicitly defined by the
hypothesis representation
• The goal of the concept learning search is to find the hypothesis
that best fits the training examples
Bayes Optimal Classifier
• One great advantage of Bayesian Decision Theory is that it
gives us a lower bound on the classification error that can be
obtained for a given problem.
• Bayes Optimal Classification: The most probable
classification of a new instance is obtained by combining the
predictions of all hypotheses, weighted by their posterior
probabilities:
argmaxvjVhi HP(vh|hi)P(hi|D)
where V is the set of all the values a classification can take and
vj is one possible such classification.
• Unfortunately, Bayes Optimal Classifier is usually too costly
to apply! ==> Naïve Bayes Classifier

52
Bayesian Networks
• A Bayesian network specifies a joint distribution in a structured form

• Represent dependence/independence via a directed graph

• Nodes = random variables
• Edges = direct dependence

• Structure of the graph  Conditional independence relations

• Requires that graph is acyclic (no directed cycles)

• Two components to a Bayesian network

• The graph structure (conditional independence assumptions)
• The numerical probabilities (for each variable given its parents)
Bayesian Networks

• General form:

𝑃(𝑋1, 𝑋2, … . 𝑋𝑁) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) )

𝑖

The full joint distribution The graph-structured approximation

Example of a simple Bayesian network
𝑃(𝑋1, 𝑋2, … . 𝑋𝑁 ) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) ) A B
𝑖

𝑃 𝐴, 𝐵, 𝐶 = 𝑃 𝐶 𝐴, 𝐵 𝑃 𝐴 𝑃(𝐵)
C

• Probability model has simple factored form

• Directed edges => direct dependence
• Absence of an edge => conditional independence

• Also known as belief networks, graphical models, causal networks

• Other formulations, e.g., undirected graphical models
Examples of 3-way Bayesian Networks

A B C Absolute Independence:
p(A,B,C) = p(A) p(B) p(C)
Examples of 3-way Bayesian Networks
• Conditionally independent
effects:
𝑝(𝐴, 𝐵, 𝐶) = 𝑝(𝐵|𝐴)𝑝(𝐶|𝐴)𝑝(𝐴)
A
• B and C are conditionally
independent given A
B C

• e.g., A is a disease, and we model

B and C as conditionally
independent symptoms given A
Examples of 3-way Bayesian Networks
• Independent Clauses:
𝑝(𝐴, 𝐵, 𝐶) = 𝑝(𝐶|𝐴, 𝐵)𝑝(𝐴)𝑝(𝐵)

A B

• “Explaining away” effect:

C
• A and B are independent but become
dependent once C is known!!
• (we’ll come back to this later)
Examples of 3-way Bayesian Networks

A B C Markov dependence:
p(A,B,C) = p(C|B) p(B|A)p(A)
The Alarm Example

• You have a new burglar alarm installed

• It is reliable about detecting burglary, but responds to minor earthquakes
• Two neighbors (John, Mary) promise to call you at work when they hear the alarm
• John always calls when hears alarm, but confuses alarm with phone ringing
(and calls then also)
• Mary likes loud music and sometimes misses alarm!
• Given evidence about who has and hasn’t called, estimate the probability of a
burglary
The Alarm Example

• Represent problem using 5 binary variables:

• B = a burglary occurs at your house
• E = an earthquake occurs at your house
• A = the alarm goes off
• J = John calls to report the alarm
• M = Mary calls to report the alarm

• What is P(B | M, J) ?
• We can use the full joint distribution to answer this question
• Requires 25 = 32 probabilities

• Can we use prior domain knowledge to come up with a Bayesian network that requires fewer
probabilities?
Constructing a Bayesian Network: Step 1
• Order the variables in terms of causality (may be a
partial order)
• e.g., {E, B} -> {A} -> {J, M}

• Use these assumptions to create the graph structure

of the Bayesian network
The Resulting Bayesian Network

network topology reflects causal knowledge

Constructing a Bayesian Network: Step 2
• Fill in conditional probability
tables (CPTs)
• One for each node
• 2𝑝 entries, where 𝑝 is the number of
parents

• Where do these probabilities

come from?
• Expert knowledge
• From data (relative frequency
estimates)
• Or a combination of both
Representation in Bayesian Belief
Networks
BusTourGroup
Storm Associated with each
node is a conditional
probability table, which
specifies the conditional
Campfire distribution for the
Lightning variable given its
immediate parents in
the graph

Thunder ForestFire

Each node is asserted to be conditionally independent of

its non-descendants, given its immediate parents

65
Bayesian learning
• Prior knowledge of the candidate hypothesis is combined with the
observed data for arriving at the final probability of a hypothesis
• Flexible than the other approaches because each observed training pattern
can influence the outcome of the hypothesis by increasing or decreasing
the estimated probability about the hypothesis
• Perform better than the other methods while validating the hypotheses
that make probabilistic predictions
• It is possible to classify new instances by combining the predictions of
multiple hypotheses, weighted by their respective probabilities.
• They can be used to create a standard for the optimal decision against
which the performance of other methods can be measured

EN 1998-4 - 2006 Eurocode 8 - Design of Structures For Earthquake Resistance - Part 4 - Silos, Tanks and Pipelines PDF
100% (1)
EN 1998-4 - 2006 Eurocode 8 - Design of Structures For Earthquake Resistance - Part 4 - Silos, Tanks and Pipelines PDF
84 pages
7SG17 - Rho3 Complete Technical Manual
No ratings yet
7SG17 - Rho3 Complete Technical Manual
80 pages
Monica-Veronica Honu: The Relationship Between Romanian Aviation Industry Turnover and National GDP 31
No ratings yet
Monica-Veronica Honu: The Relationship Between Romanian Aviation Industry Turnover and National GDP 31
7 pages
Module4 Notes
100% (1)
Module4 Notes
31 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Rubric HM FM
100% (2)
Rubric HM FM
2 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
408 Caesar II Training
No ratings yet
408 Caesar II Training
5 pages
Persuasive Speech
No ratings yet
Persuasive Speech
2 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
Module 4
No ratings yet
Module 4
51 pages
Value Engineering
No ratings yet
Value Engineering
18 pages
Geog Topo Map Notes
No ratings yet
Geog Topo Map Notes
35 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
180 pages
Ai (It) Unit-3
No ratings yet
Ai (It) Unit-3
85 pages
Unit 3
No ratings yet
Unit 3
157 pages
Unit - 3 Itai & ML
No ratings yet
Unit - 3 Itai & ML
57 pages
Unit-Iii Knowledge & Reasoning
No ratings yet
Unit-Iii Knowledge & Reasoning
35 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Unit 3
No ratings yet
Unit 3
46 pages
Module 5
No ratings yet
Module 5
30 pages
Bcs602 ML Mod-4 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-4 Notes @vtunetwork
31 pages
Unit 5
No ratings yet
Unit 5
25 pages
(FREE PDF Sample) Good Questions For Math Teaching Why Ask Them and What To Ask K 6 1st Edition Pat Lilburn Ebooks
No ratings yet
(FREE PDF Sample) Good Questions For Math Teaching Why Ask Them and What To Ask K 6 1st Edition Pat Lilburn Ebooks
82 pages
Unit 4
No ratings yet
Unit 4
36 pages
Course Plan: Department of Mechanical Engineering
No ratings yet
Course Plan: Department of Mechanical Engineering
12 pages
Module V - v1
No ratings yet
Module V - v1
58 pages
Aiml 2 3
No ratings yet
Aiml 2 3
51 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Unit II AI
No ratings yet
Unit II AI
43 pages
Unit 6
No ratings yet
Unit 6
47 pages
Unit 4
No ratings yet
Unit 4
24 pages
Machine Learning Unit 5 Part 2
No ratings yet
Machine Learning Unit 5 Part 2
16 pages
Module - 4 QB Solved-1
No ratings yet
Module - 4 QB Solved-1
31 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
ML Unit 1
No ratings yet
ML Unit 1
13 pages
Written Report Rizal
100% (2)
Written Report Rizal
3 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
Artifical Intelligence Notes Part 7
No ratings yet
Artifical Intelligence Notes Part 7
49 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Module 5
No ratings yet
Module 5
24 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
Bayesian Classification
No ratings yet
Bayesian Classification
7 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Unit 2
No ratings yet
Unit 2
20 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Aiml Iii
No ratings yet
Aiml Iii
28 pages
Unit 6
No ratings yet
Unit 6
19 pages
Unit 6 Neural Network Part 2 2
No ratings yet
Unit 6 Neural Network Part 2 2
27 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
ML 16
No ratings yet
ML 16
22 pages
Bayes Decision Theorylect3
No ratings yet
Bayes Decision Theorylect3
12 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
C - CH - Waldorf Watch
No ratings yet
C - CH - Waldorf Watch
29 pages
Mod 4
No ratings yet
Mod 4
26 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
CV - Georgy Khalatov
0% (1)
CV - Georgy Khalatov
13 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
3.1 New
No ratings yet
3.1 New
12 pages
01 Manulife TravelAid Weekly - Wednesday, August 30, 2023 - Meeting Agenda & Minutes
No ratings yet
01 Manulife TravelAid Weekly - Wednesday, August 30, 2023 - Meeting Agenda & Minutes
2 pages
Critical Thinking
No ratings yet
Critical Thinking
37 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Multithreaded Socket Programming
100% (1)
Multithreaded Socket Programming
4 pages
ResourceGuide-PCIDSS 1
No ratings yet
ResourceGuide-PCIDSS 1
8 pages
THC Detection Window
No ratings yet
THC Detection Window
16 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Edward W. Soja Los Angeles and Spatial J PDF
No ratings yet
Edward W. Soja Los Angeles and Spatial J PDF
12 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
11 Advertising Case Study Freeman
No ratings yet
11 Advertising Case Study Freeman
6 pages
Tempil Lämpöliidut
No ratings yet
Tempil Lämpöliidut
2 pages
Why Need HW/SW Co-Design?: Ic Design Has Ushered in A New Era - Soc
No ratings yet
Why Need HW/SW Co-Design?: Ic Design Has Ushered in A New Era - Soc
31 pages
Role of Blood Bank in Disaster Management
No ratings yet
Role of Blood Bank in Disaster Management
16 pages
Thesis Evaluation Criteria
No ratings yet
Thesis Evaluation Criteria
1 page
Typology of Text Image Relation
No ratings yet
Typology of Text Image Relation
7 pages
Simulation Based Watermarking For Confidential Data Security For Video Signal
No ratings yet
Simulation Based Watermarking For Confidential Data Security For Video Signal
4 pages
Tugas RO Integer Programming Formulation
100% (2)
Tugas RO Integer Programming Formulation
3 pages
Autocad Mechanical Detail Brochure en
No ratings yet
Autocad Mechanical Detail Brochure en
14 pages
Newton's Gravity (An Introductory Guide To The Mechanics of The Universe) - Douglas W. MacDougal
No ratings yet
Newton's Gravity (An Introductory Guide To The Mechanics of The Universe) - Douglas W. MacDougal
10 pages
Newton's Second Law
No ratings yet
Newton's Second Law
8 pages
Bayesian Inference: Fundamentals and Applications
From Everand
Bayesian Inference: Fundamentals and Applications
Fouad Sabry
No ratings yet
Gale Researcher Guide for: Studying Families
From Everand
Gale Researcher Guide for: Studying Families
Justin J.
No ratings yet
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet

Unit 3 Bayesian Concept Learning

Uploaded by

Unit 3 Bayesian Concept Learning

Uploaded by

Bayesian Concept Learning

AML-Prof. Minal Chauhan

• Naïve Bayes classification

• So, to get the most probable classifier, we have to evaluate

• we get the approach for the Naïve Bayes classifier as

• Here, we will be able to compute P(ai |cj ). as we have to calculate this

Prior probability P(Play=Yes) = 9/14 P(Play=No) = 5/14

• 7+3= 10 incorrect predictions, and 65+25=90 correct predictions.

• Identify the h with the highest posterior probability

• Represent dependence/independence via a directed graph

• Structure of the graph  Conditional independence relations

• Requires that graph is acyclic (no directed cycles)

• Two components to a Bayesian network

𝑃(𝑋1, 𝑋2, … . 𝑋𝑁) = ෑ 𝑃(𝑋𝑖 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋𝑖 ) )

The full joint distribution The graph-structured approximation

• Probability model has simple factored form

• Also known as belief networks, graphical models, causal networks

• e.g., A is a disease, and we model

• “Explaining away” effect:

• You have a new burglar alarm installed

• Represent problem using 5 binary variables:

• Use these assumptions to create the graph structure

network topology reflects causal knowledge

• Where do these probabilities

Each node is asserted to be conditionally independent of

You might also like