Practical Exam Aug 2021
Practical Exam Aug 2021
Bayesian Classifiers
The idea behind a Bayesian classifier is that, if an agent knows the class, it can
predict the values of the other features. If it does not know the class, Bayes'
rule can be used to predict the class given (some of) the feature values. In a
Bayesian classifier, the learning agent builds a probabilistic model of the
features and uses that model to predict the classification of a new example.
P(Y |
X1=v1,...,Xk=vk)
= (P(X1=v1,...,Xk=vk| Y) ×P(Y))/(P(X1=v1,...,Xk=vk))
Example Suppose an agent wants to predict the user action given the data
of Figure. For this example, the user action is the classification. The naive
Bayesian classifier for this example corresponds to the belief network of Figure.
The training examples are used to determine the probabilities required for the
belief network.
Suppose the agent uses the empirical frequencies as the probabilities for this
example. The probabilities that can be derived from these data are
P(Length=long|User Action=reads) = 0
To classify a new case where the author is unknown, the thread is a follow-up,
the length is short, and it is read at home,
= (2)/(9)×1 ×(1)/(2) ×c
= (1)/(9)×c
= (2)/(27)×c
This prediction does not work well on example e11, which the agent skips, even
though it is a followUp and is short. The naive Bayesian classifier summarizes
the data into a few parameters. It predicts the article will be read because being
short is a stronger indicator that the article will be read than being a follow-up
is an indicator that the article will be skipped.
The use of zero probabilities can imply some unexpected behavior. First, some
features become predictive: knowing just one feature value can rule out a
category. If we allow zero probabilities, it is possible that some combinations of
observations are impossible. See Exercise . This is a problem not necessarily
with using a Bayesian classifier but rather in using empirical frequencies as
probabilities. The alternative to using the empirical frequencies is to
incorporate pseudocounts. A designer of the learner should carefully choose
pseudocounts.
At the time, the perceptron was expected to be very significant for the
development of artificial intelligence (AI). While high hopes surrounded the
initial perceptron, technical limitations were soon demonstrated. Single-layer
perceptrons can only separate classes if they are linearly separable. Later on, it
was discovered that by using multiple layers, perceptrons can classify groups
that are not linearly separable, allowing them to solve problems single layer
algorithms can’t solve.