Homework1 Solutions
Homework1 Solutions
Jason J. Corso
Computer Science and Engineering
SUNY at Buffalo
[email protected]
Date Assigned 24 Jan 2011
Date Due 14 Feb 2011
Homework must be submitted in class. No late work will be accepted.
Remember, you are permitted to discuss this assignment with other students in the class (and not in the
class), but you must write up your own work from scratch.
I am sure the answers to some or all of these questions can be found on the internet. Copying from any
another source is indeed cheating.
This class has a zero tolerance policy toward cheaters and cheating. Don’t do it.
Suppose we replace it by a randomized decision rule, which classifies x to class i following the posterior
probability p(ω = i|x), i.e.,
Solution:
Maximizing the posterior probability is equivalent to minimizing the overall risk.
Using the zero-one loss function, the overall risk for the Bayes Decision Rule is:
I
RBayes = R(αBayes (x)|x)p(x)dx
I n
o
= 1 − max P (ωj |x) | j = 1, ..., k p(x)dx
For simplicity, the class with max posterior probability is abbreviated as ωmax , and
we get: I
RBayes = (1 − P (ωmax |x))p(x)dx.
1. What is the overall risk Rrand for this decision rule? Derive it in terms of the posterior probability
using the zero-one loss function.
1
Solution:
For any given x, the probability of each class j = 1, ..., k being the correct
class is P (ωj |k). With the randomized algorithm, it will select the correct
class with probability P (ωj |k), which means that it will select the wrong class
with
P probability 1 − P (ωj|k). Thus, the zero-one conditional risk will become
j P (ω j |x) 1 − P (ωj |x) on average. Thus,
I nX
o
Rrand = P (ωj |x) 1 − P (ωj |x) p(x)dx
j
I nX
2
o
= P (ωj |x) − P (ωj |x) p(x)dx
j
I h X i
= 1− P (ωj |x)2 p(x)dx
j
2. Show that this risk Rrand is always no smaller than the Bayes risk RBayes . Thus, we cannot benefit
from the randomized decision.
Solution:
Proving Rrand ≥ RBayes is equivalent to proving j P (ωj |x)2 ≤ P (ωmax |x):
P
X X
P (ωj |x)2 ≤ P (ωj |x)P (ωmax |x) = P (ωmax |x),
j j
Problem 2: Bayesian Classification Boundaries for the Normal Distribution (30%) Suppose we have a
two-class recognition problem with salmon (ω = 1) and sea bass (ω = 2).
1. First, assume we have one feature, the pdfs are the Gaussians N (0, σ 2 ) and N (1, σ 2 ) for the two
classes, respectively. Show that the threshold τ minimizing the average risk is equal to
1 λ12 P (ω2 )
τ= − σ 2 ln (1)
2 λ21 P (ω1)
2
Define the R(τ ) is the average risk for the threshold τ :
Z τ Z +∞
R(τ ) = λ12 P (ω2 )p(x|ω = 2) dx + λ21 P (ω1 )p(x|ω = 1) dx
0 τ
Get derivative about τ for R(τ ), then obtain the minimization when the derivative
equals to 0, so make it equals to 0:
1 τ2 1 (τ −1)2
λ12 P (ω2 ) · √ e− 2σ2 − λ21 P (ω1 ) · √ e− 2σ2 = 0
2πσ 2 2πσ
Therefore,
1 λ12 P (ω2 )
τ= − σ 2 ln
2 λ21 P (ω1 )
2. Next, suppose we have two features x = (x1 , x2 ) and the two class-conditional densities, p(x|ω = 1)
and p(x|ω = 2), are 2D Gaussian distributions centered at points (4, 11) and (10, 3) respectively
with the same covariance matrix Σ = 3I (with I is the identity matrix). Suppose the priors are
P (ω = 1) = 0.6 and P (ω = 2) = 0.4.
(a) Suppose we use a Bayes decision rule, write the two discriminant functions g1 (x) and g2 (x).
Solution:
According to bayes decision rule:
3
When all the covariance matrices are the same, the decision boundary will be a
straight line in two dimensional case, and plane or hyper-plane in three or higher
dimensional space. In particular if the covariance matrix are in a special diagonal
form Σ = σ 2 · I, the direction of the decision surface will be perpendicular to
the direction between two means. If the covariance matrices are different for
each class, then the quadratic term in the function of decision surface will not be
canceled, and thus it will not be a straight plane. Let’s restrict us to a simpler case
that all the covariance matrices are the same and analyze the influence of class
priors to the position of the decision boundary. Now we can write the decision
boundary using the following equation:
wT (x − x0 ) = 0
where
w = Σ−1 (µi − µj )
and
1 ln[P (ωi )/P (ωj )]
x0 = (µi + µj ) − (µi − µj )
2 (µi − µj )T Σ−1 (µi − µj )
In general, while the ratio of the distance between the means of the classes and the
covariance are relatively close to one, as the prior of a class increases, the decision
boundary will move toward the other class. But if the variance is relatively small as
compared to the distance between the two means, i.e., the denominator is relatively
large in above equation, the influence of class priors will be relatively small to the
position of the decision boundary, e.g., in case of two well separated Gaussians
and very peaked at the corresponding mean. On the other hand, if the variance is
relatively large as compared to the distance between the two means, the position of
the decision boundary is mainly determined by the class priors (e.g., it should be
intuitive while two classes are heavily overlapped, the decision are mainly based
on the prior knowledge we have).
(d) Using computer software, sample 100 points from each of the two densities. Draw them and
draw the boundary on the feature space (the 2D plane). Solution:
1. Formulate the problem using the Bayes rule, i.e. what are the random variables and the input data?
What are the meaning of the prior and posterior probabilities in this problem?
4
Solution:
Since who’s going to be executed tomorrow has already been decided before A
asked the janitor - otherwise the janitor won’t be able to tell A which one of A’s
inmates will live - we have:
The random variable is all the possible answers from the janitor; the input data
(observation) is the janitor’s answer; the prior probability is the chance of A be-
ing executed before observing the janitor’s answer; the posterior probability is the
chance of A being executed after observing the janitor’s answer.
2. What are the probability values for the prior?
Solution:
Let EX , where X = {A, B, C}, denote the event that X is going to be executed. The
prior probability of A being executed tomorrow is P (EA ) = P (EB ) = P (EC ) =
1
3 (suppose they kill at random).