0% found this document useful (0 votes)
36 views5 pages

Practical Exam Aug 2021

The document provides details about Bayesian classifiers. It explains that a Bayesian classifier is a probabilistic model where the classification is a latent variable related to observed variables. Classification then becomes inference in this probabilistic model. The document also provides an example of how to calculate probabilities and predictions using a naive Bayesian classifier.

Uploaded by

manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views5 pages

Practical Exam Aug 2021

The document provides details about Bayesian classifiers. It explains that a Bayesian classifier is a probabilistic model where the classification is a latent variable related to observed variables. Classification then becomes inference in this probabilistic model. The document also provides an example of how to calculate probabilities and predictions using a naive Bayesian classifier.

Uploaded by

manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Q.1 ) write in detail bayesian classifier.

Bayesian Classifiers

A Bayesian classifier is based on the idea that the role of a (natural) class is to


predict the values of features for members of that class. Examples are grouped
in classes because they have common values for the features. Such classes are
often called natural kinds. In this section, the target feature corresponds to a
discrete class, which is not necessarily binary.

The idea behind a Bayesian classifier is that, if an agent knows the class, it can
predict the values of the other features. If it does not know the class, Bayes'
rule can be used to predict the class given (some of) the feature values. In a
Bayesian classifier, the learning agent builds a probabilistic model of the
features and uses that model to predict the classification of a new example.

A latent variable is a probabilistic variable that is not observed. A Bayesian


classifier is a probabilistic model where the classification is a latent variable
that is probabilistically related to the observed variables. Classification then
become inference in the probabilistic model.

The simplest case is the naive Bayesian classifier, which makes the


independence assumption that the input features are conditionally
independent of each other given the classification. The independence of the
naive Bayesian classifier is embodied in a particular belief network where the
features are the nodes, the target variable (the classification) has no parents,
and the classification is the only parent of each input feature. This belief
network requires the probability distributions P(Y) for the target
feature Y and P(Xi|Y) for each input feature Xi. For each example, the prediction
can be computed by conditioning on observed values for the input features and
by querying the classification.
Given an example with inputs X1=v1,...,Xk=vk, Bayes' rule is used to compute the
posterior probability distribution of the example's classification, Y:

P(Y |
X1=v1,...,Xk=vk)

= (P(X1=v1,...,Xk=vk| Y) ×P(Y))/(P(X1=v1,...,Xk=vk))

(P(X1=v1|Y)×···×P(Xk=vk| Y)×P(Y))/( ∑Y P(X1=v1|


=
Y)×···×P(Xk=vk| Y) ×P(Y))

where the denominator is a normalizing constant to ensure the probabilities


sum to 1. The denominator does not depend on the class and, therefore, it is
not needed to determine the most likely class.

To learn a classifier, the distributions of P(Y) and P(Xi|Y) for each input feature


can be learned from the data. The simplest case is to use the empirical
frequency in the training data as the probability (i.e., use the proportion in the
training data as the probability). However, as shown below, this approach is
often not a good idea when this results in zero probabilities.

Figure Belief network corresponding to a naive Bayesian classifier

Example Suppose an agent wants to predict the user action given the data
of Figure. For this example, the user action is the classification. The naive
Bayesian classifier for this example corresponds to the belief network of Figure.
The training examples are used to determine the probabilities required for the
belief network.
Suppose the agent uses the empirical frequencies as the probabilities for this
example. The probabilities that can be derived from these data are

P(User Action=reads) = (9)/(18) = 0.5

P(Author=known|User Action=reads) = (2)/(3)

P(Author=known|User Action=skips) = (2)/(3)

P(Thread=new|User Action=reads) = (7)/(9)

P(Thread=new|User Action=skips) = (1)/(3)

P(Length=long|User Action=reads) = 0

P(Length=long|User Action=skips) = (7)/(9)

P(Where Read=home|User Action=reads) = (4)/(9)

P(Where Read=home|User Action=skips) = (4)/(9) .

Based on these probabilities, the features Author and Where Read have no


predictive power because knowing either does not change the probability that
the user will read the article. The rest of this example ignores these features.

To classify a new case where the author is unknown, the thread is a follow-up,
the length is short, and it is read at home,

P(User Action=reads | Thread=follow Up


∧Length=short)
P(follow Up|reads) ×P(short| reads)
=
×P(reads) ×c

= (2)/(9)×1 ×(1)/(2) ×c

= (1)/(9)×c

P(User Action=skips | Thread=follow Up


∧Length=short)
P(follow Up|skips) ×P(short| skips)
=
×P(skips)×c
= (2)/(3)×(2)/(9) ×(1)/(2) ×c

= (2)/(27)×c

where c is a normalizing constant that ensures these add up to 1. Thus, c must


be (27)/(5), so

P(User Action=reads | Thread=follow Up ∧Length=short)=0.6  .

This prediction does not work well on example e11, which the agent skips, even
though it is a followUp and is short. The naive Bayesian classifier summarizes
the data into a few parameters. It predicts the article will be read because being
short is a stronger indicator that the article will be read than being a follow-up
is an indicator that the article will be skipped.

A new case where the length is long has P(length=long|User Action=reads) = 0.


Thus, the posterior probability that the User Action=reads is zero, no matter
what the values of the other features are.

The use of zero probabilities can imply some unexpected behavior. First, some
features become predictive: knowing just one feature value can rule out a
category. If we allow zero probabilities, it is possible that some combinations of
observations are impossible. See Exercise . This is a problem not necessarily
with using a Bayesian classifier but rather in using empirical frequencies as
probabilities. The alternative to using the empirical frequencies is to
incorporate pseudocounts. A designer of the learner should carefully choose
pseudocounts.

Q.2) what is the perceptron model

Ans:- A perceptron is a simple model of a biological neuron in an artificial


neural network. Perceptron is also the name of an
early algorithm for supervised learning of binary classifiers.
The perceptron algorithm was designed to classify visual inputs, categorizing
subjects into one of two types and separating groups with a line. Classification
is an important part of machine learning and image processing. Machine
learning algorithms find and classify patterns by many different means. The
perceptron algorithm classifies patterns and groups by finding the linear
separation between different objects and patterns that are received through
numeric or visual input.

The perceptron algorithm was developed at Cornell Aeronautical Laboratory in


1957, funded by the United States Office of Naval Research. The algorithm was
the first step planned for a machine implementation for image recognition. The
machine, called Mark 1 Perceptron, was physically made up of an array of 400
photocells connected to perceptrons whose weights were recorded in
potentiometers, as adjusted by electric motors. The machine was one of the
first artificial neural networks ever created.

At the time, the perceptron was expected to be very significant for the
development of artificial intelligence (AI). While high hopes surrounded the
initial perceptron, technical limitations were soon demonstrated. Single-layer
perceptrons can only separate classes if they are linearly separable. Later on, it
was discovered that by using multiple layers, perceptrons can classify groups
that are not linearly separable, allowing them to solve problems single layer
algorithms can’t solve.

You might also like