Unit 6 Neural Network Part 2 2

This document discusses Bayesian concept learning using Bayes' theorem. It introduces Bayes' theorem and how it can be used for concept learning. Specifically: - Bayes' theorem describes how prior probabilities should be updated based on new evidence or data to determine posterior probabilities. - It discusses how a Bayesian learning algorithm can calculate posterior probabilities for different hypotheses to determine the most probable hypothesis given training data. - An example of using Bayes' theorem is provided to determine the probability of a tumor being malignant or not based on prior probabilities and test results. - The document outlines the assumptions and calculations involved in a brute force Bayesian learning algorithm to determine the hypothesis with the highest posterior probability given training data.

Uploaded by

Mihir Makwana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views27 pages

Unit 6 Neural Network Part 2 2

Uploaded by

Mihir Makwana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Bayesian Concept Learning

Machine Learning
INTRODUCTION
• The technique was derived from the work of the 18th century
mathematician Thomas Bayes.
• He developed the foundational mathematical principles, known as
Bayesian methods, which describe the probability of events, and
more importantly, how probabilities should be revised when there is
additional information available.
• Bayesian learning algorithms, like the naive Bayes classifier, are highly
practical approaches to certain types of learning problems as they can
calculate explicit probabilities for hypotheses.
APPLICATIONS
• Text-based classification such as spam or junk mail filtering, author
identification, or topic categorization.
• Medical diagnosis such as given the presence of a set of observed
symptoms during a disease, identifying the probability of new
patients having the disease.
• Network security such as detecting illegal intrusion or anomaly in
computer networks.
BAYES’ THEOREM
• Concept Learning?
• Let us take an example of how a child starts to learn meaning of new
words, e.g. ‘ball’.
• Positive Examples
• Negative Examples
• Let us define a concept set C and a corresponding function f(k). We also
define f(k) = 1, when k is within the set C and f(k) = 0 otherwise. Our aim is
to learn the indicator function f that defines which elements are within the
set C.
• In Bayes’ theorem, we will learn how to use standard probability calculus to
determine the uncertainty about the function f, and we can validate the
classification by feeding positive examples.
BAYES’ THEOREM
• Bayes’ probability rule is given as:

where A and B are conditionally related events and p(A|B) denotes the
probability of event A occurring when event B has already occurred.
• Let us assume that we have a training data set D where we have
noted some observed data. Our task is to determine the best
hypothesis in space H by using the knowledge of D.
PRIOR (the probability before the evidence
is considered)
• The prior knowledge or belief about the probabilities of various
hypotheses in H is called Prior in context of Bayes’ theorem.
• For example, if we have to determine whether a particular type of
tumor is malignant for a patient, the prior knowledge of such tumors
becoming malignant can be used to validate our current hypothesis
and is a prior probability or simply called Prior.
• We will assume that P(h) is the initial probability of a hypothesis ‘h’
that the patient has a malignant tumour based only on the
malignancy test, without considering the prior knowledge of the
correctness of the test process or the so-called training data.
POSTERIOR (updated probability after the
evidence is considered)
• The probability that a particular hypothesis holds for a data set based on
the Prior is called the posterior probability or simply Posterior.
• In the above example, the probability of the hypothesis that the patient has
a malignant tumor considering the Prior of correctness of the malignancy
test is a posterior probability.
• In our notation, we will say that we are interested in finding out P(h|T),
which means whether the hypothesis holds true given the observed
training data T.
• So, the prior probability P(h), which represents the probability of the
hypothesis independent of the training data (Prior), now gets refined with
the introduction of influence of the training data as P(h|T).
• According to Bayes’ theorem

combines the prior and posterior probabilities together.

• From the above equation, we can deduce that P(h|T) increases as
P(h) and P(T|h) increases and also as P(T) decreases.
Likelihood (probability of the evidence,
given the belief is true)
• The term "probability" refers to the possibility of something happening.
The term Likelihood refers to the process of determining the best data
distribution given a specific situation in the data.
• If every hypothesis in H has equal probable priori as P(h ) = P(h ), and then,
we can determine P(h|T) from the probability P(T|h) only.
• Similarly, P(T) is the prior probability that the training data will be observed
or, in this case, the probability of positive malignancy test results. We will
denote P(T|h) as the probability of observing data T in a space where ‘h’
holds true, which means the probability of the test results showing a
positive value when the tumour is actually malignant.
• Thus, P(T|h) is called the likelihood of data T given h, and any hypothesis
that maximizes P(T|h) is called the maximum likelihood (ML) hypothesis, h.
• See figure 6.1 and 6.2 for the conceptual
and mathematical representation of
Bayes theorem and the relationship of
Prior, Posterior and Likelihood.
EXAMPLE
• Malignancy identification in a particular patient’s tumor as an application
for Bayes rule.
• We will calculate how the prior knowledge of the percentage of cancer
cases in a sample population and probability of the test result being correct
influence the probability outcome of the correct diagnosis.
• We have two alternative hypotheses:
(1) a particular tumour is of malignant type and
(2) a particular tumour is non-malignant type.
• The priori available are—1. only 0.5% of the population has this kind of
tumour which is malignant,
• The laboratory report has some amount of incorrectness as it could
detect the malignancy was present only with 98% accuracy whereas
could show the malignancy was not present correctly only in 97% of
cases. This means the test predicted malignancy was present which
actually was a false alarm in 2% of the cases, and also missed
detecting the real malignant tumour in 3% of the cases.
• Let us denote Malignant Tumour = MT, Positive Lab Test = PT,
Negative Lab Test = NT
• h1 = the particular tumour is of malignant type = MT in our example
• h2 = the particular tumour is not malignant type = !MT in our example

P(MT) = 0.005 P(!MT) = 0.995

P(PT|MT) = 0.98 P(PT|!MT) = 0.02
P(NT|!MT) = 0.97 P(NT|MT) = 0.03
• So, for the new patient, if the laboratory test report shows positive
result, let us see if we should declare this as the malignancy case or
not:
• As P(h2|PT) is higher than P(h1|PT), it is clear that the hypothesis h2
has more probability of being true. So, hMAP = h2 = !MT.
• This indicates that even if the posterior probability of malignancy is
significantly higher than that of non malignancy, the probability of
this patient not having malignancy is still higher on the basis of the
prior knowledge.
BAYES’ THEOREM AND CONCEPT LEARNING
• If we feed the machine with the training data, then it can calculate
the posterior probability of the hypotheses and outputs the most
probable hypothesis.
• This is also called brute-force Bayesian learning algorithm.
Brute-force Bayesian algorithm
• Let us assume that the learner considers a finite hypothesis space H
in which the learner will try to learn some target concept c:X → {0,1}
where X is the instance space corresponding to H.
• The sequence of training examples is {(x1 , t1 ), (x2 ,t2 ),…, (xm , tm )},
where x_i is the instance of X and t_i is the target concept of x_i
defined as t_i = c(x_i ).
• we can assume that the sequence of instances of x {x1 ,…, xm } is held
fixed, and then, the sequence of target values becomes T = {t_1,…,
t_m }.
• For calculating the highest posterior probability, we can use Bayes’
theorem.
• Calculate the posterior probability of each hypothesis h in H:

• Identify the h_map with the highest posterior probability

• Let us try to connect the concept learning problem with the problem
of identifying the h_map.
• On the basis of the probability distribution of P(h) and P(T|h), we can
derive the prior knowledge of the learning task.
• There are few important assumptions to be made as follows:
• The training data or target sequence T is noise free, which means that it is a
direct function of X only (i.e. t_i = c(x_i ))
• The concept c lies within the hypothesis space H
• Each hypothesis is equally probable and independent of each other
• On the basis of assumption 3, we can say that each hypothesis h
within the space H has equal prior probability, and also because of
assumption 2, we can say that these prior probabilities sum up to 1.
So, we can write

• P(T|h) is the probability of observing the target values t in the fixed

set of instances {x1 ,…, xm ) in the space where h holds true and
describes the concept c correctly.
• Using assumption 1 mentioned above, we can say that if T is
consistent with h, then the probability of data T given the hypothesis
h is 1 and is 0 otherwise:

• Using Bayes’ theorem to identify the posterior probability

• For the cases when h is inconsistent with the training data T, using Eq.
1. we get
and when h is consistent with T

• Now, if we define a subset of the hypothesis H which is consistent

with T as H , then by using the total probability equation, we get
• This makes Eq. 1. as

• So, with our set of assumptions about P(h) and P(T|h), we get the
posterior probability P(h|T) as

• where H is the number of hypotheses from the space H which are

consistent with target data set T.
• The interpretation of this evaluation is that initially, each hypothesis
has equal probability and, as we introduce the training data, the
posterior probability of inconsistent hypotheses becomes zero and
the total probability that sums up to 1 is distributed equally among
the consistent hypotheses in the set. So, under this condition, each
consistent hypothesis is a MAP hypothesis with posterior probability.
Bayes optimal classifier
• What is the most probable classification of the new instance given
the training data?
• To illustrate the concept, let us assume three hypotheses h1, h2, and
h3 in the hypothesis space H. Let the posterior probability of these
hypotheses be 0.4, 0.3, and 0.3, respectively.
• There is a new instance x, which is classified as true by h1, but false
by h2 and h3.
• Then the most probable classification of the new instance (x) can be
obtained by combining the predictions of all hypotheses weighed by
their corresponding posterior probabilities.
• By denoting the possible classification of the new instance as c from
the set C, the probability P(c |T) that the correct classification for the
new instance is c is

• The optimal classification is for which P(c_i|T) is maximum is

• So, extending the above example,
• The set of possible outcomes for the new instance x is within the set C =
{True, False} and
• This method maximizes the probability that the new instance is
classified correctly when the available training data, hypothesis space
and the prior probabilities of the hypotheses are known.
• This is thus also called Bayes optimal classifier.
Naïve Bayes classifier
• Already Covered in Supervised Learning Chapter.

Solution Manual Introduction To Metric and Topological Spaces 2nd Edition by Sutherland
No ratings yet
Solution Manual Introduction To Metric and Topological Spaces 2nd Edition by Sutherland
4 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Grade 3 Math Text Book
100% (2)
Grade 3 Math Text Book
112 pages
(Multiphysics Modeling) Mehrzad Tabatabaian - COMSOL For Engineers-Mercury Learning & Information (2014) PDF
No ratings yet
(Multiphysics Modeling) Mehrzad Tabatabaian - COMSOL For Engineers-Mercury Learning & Information (2014) PDF
272 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
ML 16
No ratings yet
ML 16
22 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
UNIT 4 - Bayesian Learning
No ratings yet
UNIT 4 - Bayesian Learning
54 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Module 5
No ratings yet
Module 5
30 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
Unit2 - 5 Part 1
No ratings yet
Unit2 - 5 Part 1
14 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Module 5
No ratings yet
Module 5
24 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
Module 4
No ratings yet
Module 4
51 pages
Module - 5 - Notes BAYESIAN Learning Notes
No ratings yet
Module - 5 - Notes BAYESIAN Learning Notes
24 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
180 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
Module - 4 QB Solved-1
No ratings yet
Module - 4 QB Solved-1
31 pages
Bayesian Classification
No ratings yet
Bayesian Classification
7 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
Bayesian Learning Methods
No ratings yet
Bayesian Learning Methods
57 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
Module4 Notes
100% (1)
Module4 Notes
31 pages
3.1 New
No ratings yet
3.1 New
12 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
25 pages
Unit 4
No ratings yet
Unit 4
18 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
Unit 3
No ratings yet
Unit 3
157 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
Unit 4
No ratings yet
Unit 4
24 pages
Bayes
No ratings yet
Bayes
48 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Mathematics - Iii: Institute of Science&Technology
No ratings yet
Mathematics - Iii: Institute of Science&Technology
16 pages
Bcs602 ML Mod-4 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-4 Notes @vtunetwork
31 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
ML Bayes05
No ratings yet
ML Bayes05
18 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
Unit III
No ratings yet
Unit III
19 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Bayesian
No ratings yet
Bayesian
14 pages
June 2019 Mechanics Shadow Paper 3
No ratings yet
June 2019 Mechanics Shadow Paper 3
5 pages
Resposta de TRR PDF
0% (1)
Resposta de TRR PDF
57 pages
9th Math 2nd Term2014
No ratings yet
9th Math 2nd Term2014
4 pages
Iit, Review Final Examination, Math 333
No ratings yet
Iit, Review Final Examination, Math 333
4 pages
Working With Data in Eviews
No ratings yet
Working With Data in Eviews
24 pages
Searching and Sorting Algorithms: CS117, Spring 2006 Supplementary Lecture Notes Written by Amy Csizmar Dalal
No ratings yet
Searching and Sorting Algorithms: CS117, Spring 2006 Supplementary Lecture Notes Written by Amy Csizmar Dalal
15 pages
3.1+ (PPT) +Linear+Programming+ +Sensitivity+Analysis
No ratings yet
3.1+ (PPT) +Linear+Programming+ +Sensitivity+Analysis
17 pages
Test Chapter 3
No ratings yet
Test Chapter 3
4 pages
AP Statistics Summer Homework
100% (1)
AP Statistics Summer Homework
8 pages
Notes On Exponential Distribution: January 2008
No ratings yet
Notes On Exponential Distribution: January 2008
13 pages
Chapter 8: Performance Surfaces and Optimum Points: Brandon Morgan 1/15/2021
No ratings yet
Chapter 8: Performance Surfaces and Optimum Points: Brandon Morgan 1/15/2021
10 pages
Ajol File Journals - 295 - Articles - 199386 - Submission - Proof - 199386 3517 500926 1 10 20200904
No ratings yet
Ajol File Journals - 295 - Articles - 199386 - Submission - Proof - 199386 3517 500926 1 10 20200904
14 pages
Prikthai 2022
No ratings yet
Prikthai 2022
64 pages
Quotations
No ratings yet
Quotations
40 pages
SSC Selection Post Phase 9 Syllabus PDF
No ratings yet
SSC Selection Post Phase 9 Syllabus PDF
5 pages
Eg8 Mat Chapter16
No ratings yet
Eg8 Mat Chapter16
11 pages
Tank
No ratings yet
Tank
12 pages
3 Probability
100% (1)
3 Probability
54 pages
Grade 3 Session 1
100% (1)
Grade 3 Session 1
38 pages
Recursive Mmse Estimation of Wireless Channels Based On Training Data and Structured Correlation Learning
No ratings yet
Recursive Mmse Estimation of Wireless Channels Based On Training Data and Structured Correlation Learning
6 pages
WBJEE 2016 Solution Maths PDF
No ratings yet
WBJEE 2016 Solution Maths PDF
21 pages
Bisection Method: Roots of Equation
No ratings yet
Bisection Method: Roots of Equation
6 pages
4024 s13 QP 21
No ratings yet
4024 s13 QP 21
24 pages
Turbine Generator Governor Droop Isochronous Fundamentals - A Graphical Approach
100% (1)
Turbine Generator Governor Droop Isochronous Fundamentals - A Graphical Approach
8 pages
Intro To Polynomials Guided Notes
100% (1)
Intro To Polynomials Guided Notes
24 pages
Math All
No ratings yet
Math All
116 pages
Ang A. H-S, Probability Concepts in Engineering Planning and Design, 1984
86% (14)
Ang A. H-S, Probability Concepts in Engineering Planning and Design, 1984
572 pages

Unit 6 Neural Network Part 2 2

Uploaded by

Unit 6 Neural Network Part 2 2

Uploaded by

Bayesian Concept Learning

combines the prior and posterior probabilities together.

P(MT) = 0.005 P(!MT) = 0.995

• Identify the h_map with the highest posterior probability

• P(T|h) is the probability of observing the target values t in the fixed

• Using Bayes’ theorem to identify the posterior probability

• Now, if we define a subset of the hypothesis H which is consistent

• where H is the number of hypotheses from the space H which are

• The optimal classification is for which P(c_i|T) is maximum is

You might also like