0% found this document useful (0 votes)

13 views46 pages

UNIT4 - Part2 Aiml

The document outlines key concepts in Artificial Intelligence and Machine Learning, focusing on Artificial Neural Networks and Bayesian Learning. It discusses the principles of Bayesian methods, including Bayes theorem, maximum a posteriori (MAP) and maximum likelihood (ML) hypotheses, and the naive Bayes classifier. Additionally, it highlights the practical challenges of applying Bayesian methods and provides examples of their application in classification tasks.

Uploaded by

bharmal.dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views46 pages

UNIT4 - Part2 Aiml

Uploaded by

bharmal.dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

M.S.

Ramaiah Institute of Technology

(Autonomous Institute, Affiliated to VTU)
Department of Computer Science and Engineering

Artificial Intelligence and Machine Learning

(CS52)

UNIT - 4

1
OUTLINE
Artificial Neural Networks - Introduction, Neural Network
Representation, Appropriate problems for Neural Network Learning,
Perceptrons, Multilayer Networks and the Backpropagation algorithm.
Bayesian Learning - Introduction, Bayes theorem, Naive Bayes
Classifier, The EM Algorithm.
Chapter 4 and 6(6.1,6.2,6.9,6.12 ) of TextBook2

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 2

Introduction
Bayesian learning methods are relevant to study of machine learning
for two different reasons.
•First, Bayesian learning algorithms that calculate explicit probabilities
for hypotheses, such as the naive Bayes classifier, are among the
most practical approaches to certain types of learning problems
•The second reason is that they provide a useful perspective for
understanding many learning algorithms that do not explicitly
manipulate probabilities.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 3

Introduction
Features of Bayesian Learning Methods
•Each observed training example can incrementally decrease or increase the estimated
probability that a hypothesis is correct.
•Prior knowledge can be combined with observed data to determine the final probability of
a hypothesis.
•Bayesian methods can accommodate hypotheses that make probabilistic predictions
•New instances can be classified by combining the predictions of multiple hypotheses,
weighted by their probabilities.
•Even in cases where Bayesian methods prove computationally intractable, they can provide
a standard of optimal decision making against which other practical methods can be
measured.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 4

Introduction
Practical difficulty in applying Bayesian methods
•One practical difficulty in applying Bayesian methods is that they typically
require initial knowledge of many probabilities. When these probabilities
are not known in advance they are often estimated based on background
knowledge, previously available data, and assumptions about the form of
the underlying distributions.
•A second practical difficulty is the significant computational cost required
to determine the Bayes optimal hypothesis in the general case. In certain
specialized situations, this computational cost can be significantly
reduced.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 5

Conditional Probability
Is defined as the probability of an event A, given
that another event B has already occurred (i.e. A
conditional B).
This is represented by P(A|B) and we can define it as:

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 6

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 7
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 8
Consider that A and B are any two events from a sample space S

Using our understanding of conditional probability, we have:

P(A|B) = P(A ∩ B) / P(B)

P(B|A) = P(A ∩ B) / P(A)

It follows that P(A ∩ B) = P(A|B) * P(B) = P(B|A) * P(A)

Thus, P(A|B) = P(B|A)*P(A) / P(B)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 9

BAYES THEOREM
Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior
probability, the probabilities of observing various data given the hypothesis, and the observed
data itself.
Notations
•P(h|D) posterior probability of h, reflects confidence that h
holds after D has been observed
•P(h) initial or prior probability that hypothesis h holds, before
we have observed the training data.
•P(D) prior probability that training data D will be observed
•P(D|h) probability of observing data D given a world in which
hypothesis h holds
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 10
BAYES THEOREM

P(h|D) increases with P(h) and with P(D|h) according to Bayes theorem.
P(h|D) decreases as P(D) increases, because the more probable it is that D will
be
observed independent of h, the less evidence D provides in support of h.
BAYES THEOREM
Maximum a posteriori (MAP) hypothesis
The learner considers some set of candidate hypotheses H and is interested in
finding the most probable hypothesis h ∈ H given the observed data D. Any such
maximally probable hypothesis is called a maximum a posteriori (MAP)
hypothesis.
Bayes theorem to calculate the posterior probability of each candidate hypothesis
is hMAP is a MAP hypothesis provided

P(D) can be dropped, because it is a constant independent of h

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 12

BAYES THEOREM
Maximum Likelihood (ML) Hypothesis
In some cases, it is assumed that every hypothesis in H is equally probable a priori (P(hi) = P(hj)
for all hi and hj in H).
In this case the below equation can be simplified and need only consider the term P(D|h) to find
the most probable hypothesis.

P(D|h) is often called the likelihood of the data D given h, and any hypothesis
that maximizes P(D|h) is called a maximum likelihood (ML) hypothesis
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 13
BAYES THEOREM - Example
Consider a medical diagnosis problem in which there are two alternative hypotheses:
(1) Patient has a particular form of cancer (+)
(2) Patient does not have any form of cancer (-)
A patient takes a lab test and the results comes positive.
The test returns a correct positive result in only 98% of the cases in which the disease is
actually present and a correct negative result in only 97% of the cases in which the disease is
not present.
Furthermore, 0.008 of the entire population have this cancer.
Determine whether the patient has Cancer or not using MAP hypothesis

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 14

BAYES THEOREM - Example
Two alternative hypotheses
•The patient has a particular form of cancer (denoted by cancer)
•The patient does not (denoted by ¬ cancer)

The available data is from a particular laboratory with two possible outcomes:
+ (positive) and - (negative)

1 - 0.008 = 0.992

2 /100 = 0.02

3 /100 = 0.03
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 15
BAYES THEOREM - Example
Suppose a new patient is observed for whom the lab test returns a
positive (+) result.
Should we diagnose the patient as having cancer or not?

Hence, the new patient with lab test positive is not having cancer
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16
BAYES THEOREM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 17

BAYES THEOREM
For any propositions a and b, we have

Conditional probability can be written in a different form called

the Product rule:

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 18

BAYES THEOREM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 19

NAIVE BAYES CLASSIFIER
One highly practical Bayesian learning method is the naive
Bayes learner, often called the naive Bayes classifier.

The naive Bayes classifier applies to learning tasks where

each instance x is described by a conjunction of attribute
values and where the target function f ( x ) can take on any
value from some finite set V.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 20

NAIVE BAYES CLASSIFIER
A set of training examples of the target function is
provided, and a new instance is presented, described
by the tuple of attribute values (al, a2.. .a,).

The learner is asked to predict the target value, or

classification, for this new instance.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 21

NAIVE BAYES CLASSIFIER
The Bayesian approach to classifying the new instance is to assign the most
probable target value, VMAP, given the attribute values <al,a2 . . .an> that
describe the instance.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 22

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER -
Example
The following table gives data set about target concept PlayTennis. Using Naïve Bayes
classifier classify the following novel instance:
(Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong)