0% found this document useful (0 votes)

6 views16 pages

Midterm Sol

The document is a midterm exam for the course 10-701 from Spring 2011, consisting of various questions related to topics such as Bayes Nets, decision trees, and linear regression. It includes instructions for the exam, a breakdown of scores for each question, and detailed questions with solutions regarding statistical models and their properties. The exam allows the use of personal materials but prohibits electronic devices and collaboration with other students.

Uploaded by

DHANRAJ Malla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views16 pages

Midterm Sol

Uploaded by

DHANRAJ Malla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

10-701 Midterm Exam, Spring 2011

1. Personal info:

• Name:
• Andrew account:
• E-mail address:

2. There are 14 numbered pages in this exam (including this cover sheet).

3. You can use any material you brought: any book, notes, and print outs. You cannot
use materials brought by other students.

4. No computers, PDAs, phones or Internet access.

5. If you need more room to answer a question, use the back of the preceding page.

6. Work efficiently. Consider answering all of the easier questions first.

7. There is one optional extra credit question, which will not affect the grading curve. It
will be used to bump your grade up, without affecting anyone else’s grade.

8. You have 80 minutes, the test has 100 points. Good luck!

Question Topic Max. score Score

1 Short Questions 20

2 Bayes Nets 23

3 Decision Surfaces and Training Rules 12

4 Linear Regression 20

5 Conditional Independence Violation 25

6 [Extra Credit] Violated Assumptions 6

1
1 [20 Points] Short Questions
1.1 True or False (Grading: Carl Doersch)
Answer each of the following True of False. If True, give a short justification. If False, a
counter-example or convincing one-sentence explanation.

1. [2 pts] If we train a Naive Bayes classifier using infinite training data that satisfies all
of its modeling assumptions (e.g., conditional independence), then it will achieve zero
training error over these training examples.

F SOLUTION: False. There will still be unavoidable error. In Naive Bayes, Y is

probabilistic, so it is often impossible to predict Y even if the model’s estimate of P (Y ) is
perfect. Furthermore, Naive Bayes is linear, and so it can’t necessarily even estimate P (Y )
perfectly: for example, in the distribution Y = 1 ⇔ X1 XORX2 .

2. [2 pts] If we train a Naive Bayes classifier using infinite training data that satisfies all
of its modeling assumptions (e.g., conditional independence), then it will achieve zero
true error over test examples drawn from this same distribution.

F SOLUTION: False, for the same reasons as above.

3. [2 pts] Every Bayes Net defined over 10 variables hX1 , X2 , . . . X10 i tells how to factor
the joint probability distribution P (X1 , X2 , . . . X10 ) into the product of exactly 10
terms.

F SOLUTION: True, by the definition of Bayes Net.

Consider the three Bayes Nets shown below:

A B C

4. [3 pts] True or false: Every joint distribution P (X1 , X2 , X3 ) that can be defined by
adding Conditional Probability Distributions (CPD) to Bayes Net graph A can also be
expressed by appropriate CPD’s for Bayes Net graph B.

2
F SOLUTION: True. If a distribution can be represented in graph A, it will factorize as
P (X2 )P (X1 |X2 )P (X3 |X2 ). Using Bayes rule, this becomes P (X2 )P (X3 |X2 )P (X2 |X1 )P (X1 )/P (X2 ) =
P (X3 |X2 )P (X2 |X1 )P (X1 ).

5. [3 pts] True or false: Every joint distribution P (X1 , X2 , X3 ) that can be defined
by adding Conditional Probability Distributions to Bayes Net graph A can also be
expressed by appropriate CPD’s for Bayes Net graph C.

F SOLUTION: False. A can represent distributions where X1 can depend on X3

given no information about X2 , whereas graph C cannot.

1.2 Quick questions (Grading: Yi Zhang)

Answer each of the following in one or two sentences, in the space provided.

1. [2 pts] Prove that P (X1 |X2 )P (X2 ) = P (X2 |X1 )P (X1 ). (Hint: This is a two-line
proof.)

F SOLUTION: P (X1 |X2 )P (X2 ) = P (X1 , X2 ) = P (X2 |X1 )P (X1 )

2. [3 pts] Consider a decision tree learner applied to data where each example is described
by 10 boolean variables hX1 , X2 , . . . X10 i. What is the VC dimension of the hypothesis
space used by this decision tree learner?

F SOLUTION: The VC dimension is 210 , because we can shatter 210 examples using
a tree with 210 leaf nodes, and we cannot shatter 210 + 1 examples (since in that case we
must have duplicated examples and they can be assigned with conflicting labels).

3. [3 pts] Consider the plot below showing training and test set accuracy for decision
trees of different sizes, using the same set of training data to train each tree. Describe
in one sentence how the training data curve (solid line) will change if the number of
training examples approaches infinity. In a second sentence, describe what will happen
to the test data curve under the same condition.

3
F SOLUTION: The new training accuracy curve should be below the original training
curve (since it’s impossible for the trees to overfit infinite training data); the new testing
accuracy curve should be above the original testing curve and become identical to the new
training curve (since trees learned from infinite training data should perform well on testing
data and do not overfit at all).

4
2 [23 Points] Bayes Nets (Grading: Carl Doersch)
2.1 [17 pts] Inference
In the following graphical model, A, B, C, and D are binary random variables.

1. [2 pts] How many parameters are needed to define the Conditional Probability Dis-
tributions (CPD’s) for this Bayes Net?

F SOLUTION: 8: 1 for A, 1 for B, 2 for C, and 4 for D.

2. [2 pts] Write an expression for the probability P (A = 1, B = 1, C = 1, D = 1) in

terms of the Bayes Net CPD parameters. Use notation like P (C = 1|A = 0) to denote
specific parameters in the CPD’s.

F SOLUTION:

P (A = 1)P (B = 1)P (C = 1|A = 1)P (D = 1|A = 1, B = 1)

3. [3 pts] Write an expression for P (A = 0|B = 1, C = 1, D = 1) in terms of the Bayes

Net Conditional Probability Distribution (CPD) parameters.

F SOLUTION:

P (A = 0)P (B = 1)P (C = 1|A = 0)P (D = 1|A = 0, B = 1)

P (A = 0)P (B = 1)P (C = 1|A = 0)P (D = 1|A = 0, B = 1)+

P (A = 1)P (B = 1)P (C = 1|A = 1)P (D = 1|A = 1, B = 1)

4. [2 pts] True or False (give brief justification): C is conditionally independent of B

given D.

5
F SOLUTION: False. There is one path from C to B, and this path isn’t blocked at
either node.

5. [2 pts] True or False (give brief justification): C is conditionally independent of B

given A.

F SOLUTION: True. The path is now blocked at both A and D.

Suppose we use EM to train the above Bayes Net from the partially labeled data given
below, first initializing all Bayes net parameters to 0.5.

A B C D

1 0 1 0

1 ? 0 1

1 1 0 ?

0 ? 0 ?

0 1 0 ?

6. [2 pts] How many distinct quantities will be updated during the first M step?

F SOLUTION: 5 or 8, depending on your interpretation. In the M step we update

the values of all parameters, and from part 1 there were 8 parameters. However, only 5 of
them will actually be changed if your algorithm’s initialization is clever.

7. [2 pts] How many distinct quantities will be estimated during the first E step?

F SOLUTION: 5. Every unknown value must be estimated.

8. [2 pts] When EM converges, what will be the final estimate for P (C = 0|A = 1)?
[Hint: You do not need a calculator.]

F SOLUTION: 2/3: the fraction of times when C=0 out of all examples where A=1.

6
2.2 [6 pts] Constructing a Bayes net
Draw a Bayes net over the random variables {A, B, C, D} where the following conditional
independence assumptions hold. Here, X⊥Y |Z means X is conditionally independent of Y
given Z, and X ⊥Y |Z means X and Y are not conditionally independent given Z, and ∅
stands for the empty set.

• A⊥B|∅
• A⊥D|B
• A⊥D|C
• A⊥C|∅
• B ⊥C|∅
• A⊥B|D
• B⊥D|A, C

F SOLUTION:

7
3 [12 Points] Decision Surfaces and Training Rules (Grad-
ing: Yi Zhang)
Consider a classification problem with two boolean variables X1 , X2 ∈ {0, 1} and label
Y ∈ {0, 1}. In Figure 1 we show two positive (“+”) and two negative (“-”) examples.

Figure 1: Two positive examples and two negative examples.

Question [2 pts]: Draw (or just simply describe) a decision tree that can perfectly
classify the four examples in Figure 1.

F SOLUTION: Split using one variable (e.g., X1 ) and then split using the other variable
(e.g., X2 ). Label each leaf node according to the assigned training example.

Question [3 pts]: In the class we learned the training rule to grow a decision tree:
we start from a single root node and iteratively split each node using the “best” attribute
selected by maximizing the information gain of the split. We will stop splitting a node if:
1) examples in the node are already pure; or 2) we cannot find any single attribute that
gives a split with positive information gain. If we apply this training rule to the examples
in Figure 1, will we get a decision tree that perfectly classifies the examples? Briefly explain
what will happen.

F SOLUTION: We will stop at a single root node and cannot grow the tree any more. This
is because, at the root node, splitting on any single variable has zero information gain.

8
Question [5 pts]: Suppose we learn a Naive Bayes classifier from the examples in
Figure 1, using MLE (maximum likelihood estimation) as the training rule. Write down
all the parameters and their estimated values (note: both P (Y ) and P (Xi |Y ) should be
Bernoulli distributions). Also, does this learned Naive Bayes perfectly classify the four
examples?

F SOLUTION: P (Y = 1) = 0.5 (= P (Y = 0))

P (X1 = 1|Y = 0) = P (X1 = 1|Y = 1) = 0.5 (= P (X1 = 0|Y = 0) = P (X1 = 0|Y = 1))
P (X2 = 1|Y = 0) = P (X2 = 1|Y = 1) = 0.5 (= P (X2 = 0|Y = 0) = P (X2 = 0|Y = 1))
This is a very poor classifier since for any X1 , X2 it will predict P (Y = 1|X1 , X2 ) = P (Y =
0|X1 , X2 ) = 0.5. Naturally, it cannot perfectly classify the examples in the figure.

Question [2 pts]: Is there any logistic regression classifier using X1 and X2 that can
perfectly classify the examples in Figure 1? Why?

F SOLUTION: No, logistic regression only forms linear decision surface, but the examples
in the figure are not linearly separable.

9
4 [20 Points] Linear Regression (Grading: Xi Chen)
Consider a simple linear regression model in which y is the sum of a deterministic linear
function of x, plus random noise .

y = wx +
where x is the real-valued input; y is the real-valued output; and w is a single real-valued
parameter to be learned. Here is a real-valued random variable that represents noise,
and that follows a Gaussian distribution with mean 0 and standard deviation σ; that is,
∼ N (0, σ)
(a) [3pts] Note that y is a random variable because it is the sum of a deterministic
function of x, plus the random variable . Write down an expression for the probability
distribution governing y, in terms of N (), σ, w and x.

F SOLUTION: y follows a Gaussian distribution with the mean wx and the standard devi-
ation σ:
(y − wx)2

1
p(y|w, x) = √ exp −
2πσ 2σ 2

(b) [3 pts] You are given n i.i.d. training examples {(x1 , y 1 ), (x2 , y 2 ), . . . , (xn , y n )} to
train this model. Let Y = (y 1 , . . . , y n ) and X = (x1 , . . . , xn ), write an expression for the
conditional data likelihood: p(Y|X , w).

F SOLUTION:
n
Y
p(Y|X , w) = p(y i |xi , w)
i=1
n
n/2 Y
(y i − wxi )2

1
= exp −
2πσ 2 i=1
2σ 2
n/2 Pn i i 2

1 i=1 (y − wx )
= exp −
2πσ 2 2σ 2

10
(c) [9 pts] Here you will derive the expression for obtaining a MAP estimate of w from
the training data. Assume a Gaussian prior over w with mean 0 and standard deviation τ
(i.e. w ∼ N (0, τ )). Show that finding the MAP estimate w∗ is equivalent to solving the
following optimization problem:
n
∗ 1X i λ
w = argminw (y − wxi )2 + w2 ;
2 i=1 2
Also express the regularization parameter λ in terms of σ and τ .

F SOLUTION:
p(w|Y, X ) ∝ p(Y|X , w)p(w|X )
Pn i i 2
w2

i=1 (y − wx )
∝ exp − exp − 2
2σ 2 2τ

w∗ = argmaxw ln p(w|Y, X )
Pn
(y i − wxi )2 w2
= argmaxw − i=1 −
Pn 2σ 2 2τ 2
i i 2 2
(y − wx ) w
= argminw i=1 2
+ 2
2σ 2τ
n
1 X σ2
= argminw (y i − wxi )2 + 2 w2
2 i=1 2τ
σ2
We can see that λ = τ2
.
(d) [5pts] Above we assumed a zero-mean prior for w, which resulted in the usual λ2 w2
regularization term for linear regression. Sometimes we may have prior knowledge that
suggests w has some value other than zero. Write down the revised objective function that
would be derived if we assume a Gaussian prior on w with mean µ instead of zero (i.e., if
the prior is w ∼ N (µ, τ )).

F SOLUTION:
p(w|Y, X ) ∝ p(Y|X , w)p(w|X )
Pn i i 2
(w − µ)2

i=1 (y − wx )
∝ exp − exp −
2σ 2 2τ 2

w∗ = argmaxw ln p(w|Y, X )
Pn
(y i − wxi )2 (w − µ)2
= argmaxw − i=1 2
−
Pn 2σ 2τ 2
i i 2
(y − wx ) (w − µ)2
= argminw i=1 +
2σ 2 2τ 2
n
1X i σ2
= argminw (y − wxi )2 + 2 (w − µ)2
2 i=1 2τ

11
5 [25 Points] Conditional Independence Violation (Grad-
ing: Yi Zhang)
5.1 Naive Bayes without Conditional Independence Violation
Table 2: P (X1 |Y )
Table 1: P (Y )
X1 = 0 X1 = 1
Y =0 Y =1
Y =0 0.7 0.3
0.8 0.2
Y =1 0.3 0.7

Consider a binary classification problem with variable X1 ∈ {0, 1} and label Y ∈ {0, 1}.
The true generative distribution P (X1 , Y ) = P (Y )P (X1 |Y ) is shown as Table 1 and Table 2.
Question [4 pts]: Now suppose we have trained a Naive Bayes classifier, using infinite
training data generated according to Table 1 and Table 2. In Table 3, please write down
the predictions from the trained Naive Bayes for different configurations of X1 . Note that
Ŷ (X1 ) in the table is the decision about the value of Y given X1 . For decision terms in the
table, write down either Ŷ = 0 or Ŷ = 1; for probability terms in the table, write down the
actual values (and the calculation process if you prefer, e.g., 0.8 ∗ 0.7 = 0.56).
Table 3: Predictions from the trained Naive Bayes
P̂ (X1 , Y = 0) P̂ (X1 , Y = 1) Ŷ (X1 )
X1 = 0 0.8 × 0.7 = 0.56 0.2 × 0.3 = 0.06 Ŷ = 0
X1 = 1 0.8 × 0.3 = 0.24 0.2 × 0.7 = 0.14 Ŷ = 0

F SOLUTION: The naive Bayes model learned from infinite data will have P̂ (Y ) and
P̂ (X1 |Y ) estimated exactly as Table 1 and Table 2. The resulting predictions are shown in
Table 3.

Question [3 pts]: What is the expected error rate of this Naive Bayes classifier on
testing examples that are generated according to Table 1 and Table 2? In other words,
P (Ŷ (X1 ) 6= Y ) when (X1 , Y ) is generated according to the two tables.
Hint: P (Ŷ (X1 ) 6= Y ) = P (Ŷ (X1 ) 6= Y, X1 = 0) + P (Ŷ (X1 ) 6= Y, X1 = 1).

F SOLUTION:
P (Ŷ (X1 ) 6= Y ) = P (Ŷ (X1 ) 6= Y, X1 = 0) + P (Ŷ (X1 ) 6= Y, X1 = 1)
= P (Y = 1, X1 = 0) + P (Y = 1, X1 = 1)
= 0.06 + 0.14
= 0.2

12
5.2 Naive Bayes with Conditional Independence Violation
Consider two variables X1 , X2 ∈ {0, 1} and label Y ∈ {0, 1}. Y and X1 are still generated
according to Table 1 and Table 2, and then X2 is created as a duplicated copy of X1 .
Question [6 pts]: Now suppose we have trained a Naive Bayes classifier, using infinite
training data that are generated according to Table 1, Table 2 and the duplication rule.
In Table 4, please write down the predictions from the trained Naive Bayes for different
configurations of (X1 , X2 ). For probability terms in the table, you can write down just the
calculation process (e.g., one entry might be 0.8 ∗ 0.3 ∗ 0.3 = 0.072, and you can just write
down 0.8 ∗ 0.3 ∗ 0.3 to save some time). Hint: the Naive Bayes classifier does assume that
X2 is conditionally independent of X1 given Y .
Table 4: Predictions from the trained Naive Bayes
P̂ (X1 , X2 , Y = 0) P̂ (X1 , X2 , Y = 1) Ŷ (X1 , X2 )
X1 = 0, X2 = 0 0.8 × 0.7 × 0.7 0.2 × 0.3 × 0.3 Ŷ = 0
X1 = 1, X2 = 1 0.8 × 0.3 × 0.3 0.2 × 0.7 × 0.7 Ŷ = 1
X1 = 0, X2 = 1 0.8 × 0.7 × 0.3 0.2 × 0.3 × 0.7 Ŷ = 0
X1 = 1, X2 = 0 0.8 × 0.3 × 0.7 0.2 × 0.7 × 0.3 Ŷ = 0

F SOLUTION: The naive Bayes model learned from infinite data will have P̂ (Y ) and
P̂ (X1 |Y ) estimated exactly as Table 1 and Table 2. However, it also has P̂ (X2 |Y ) incorrectly
estimated as Table 2. The resulting predictions are shown in Table 4.

Question [3 pts]: What is the expected error rate of this Naive Bayes classifier on
testing examples that are generated according to Table 1, Table 2 and the duplication rule?

F SOLUTION: Note that the testing examples are generated according to the true distri-
bution (i.e., where X2 is a duplication). We have:

P (Ŷ (X1 , X2 ) 6= Y ) = P (Ŷ (X1 , X2 ) 6= Y, X1 = X2 = 0) + P (Ŷ (X1 , X2 ) 6= Y, X1 = X2 = 1)

= P (Y = 1, X1 = X2 = 0) + P (Y = 0, X1 = X2 = 1)
= P (Y = 1, X1 = 0) + P (Y = 0, X1 = 1)
= 0.06 + 0.24
= 0.3

Question [3 pts]: Compared to the scenario without X2 , how does the expected error
rate change (i.e., increase or decrease)? In Table 4, the decision rule Ŷ on which configura-
tion is responsible to this change? What actually happened to this decision rule? (You need
to briefly answer: increase or decrease, the responsible configuration, and what happened.)

13
F SOLUTION: The expected error rate increases from 0.2 to 0.3, due to the incorrect
decision Ŷ = 1 on the configuration X1 = X2 = 1. Basically the naive Bayes model makes
the incorrect conditional independence assumption and considers both X1 = 1 and X2 = 1 as
evidence.

5.3 Logistic Regression with Conditional Independence Violation

Question [2 pts]: Will logistic regression suffer from having an additional variable X2 that
is actually a duplicate of X1 ? Intuitively, why (hint: model assumptions)?

F SOLUTION: No. Logistic regression does not make conditional independence assumption.
(Note: in the class we did derive the form P (Y |X) of logistic regression from naive Bayes
assumptions, but that does not mean logistic regression makes the conditional independence
assumption).

Now we will go beyond the intuition. We have a training set D1 of L examples D1 =

{(X11 , Y 1 ), . . . , (X1L , Y L )}. Suppose we generate another training set D2 of L examples D2 =
{(X11 , X21 , Y 1 ), . . . , (X1L , X2L , Y L )}, where in each example X1 and Y are the same as in D1
and then X2 is a duplicate of X1 . Now we learn a logistic regression from D1 , which should
contain two parameters: w0 and w1 ; we also learn another logistic regression from D2 , which
should have three parameters: w00 , w10 and w20 .
Question [4 pts] : First, write down the training rule (maximum conditional likelihood
estimation) we use to estimate (w0 , w1 ) and (w00 , w10 , w20 ) from data. Then, given the training
rule, what is the relationship between (w0 , w1 ) and (w00 , w10 , w20 ) we estimated from D1 and
D2 ? Use this fact to argue whether or not the logistic regression will suffer from having an
additional duplicate variable X2 .

F SOLUTION:
The training rule for (w0 , w1 ) is to maximize:
L
Y L
X
ln P (Y l |X1l , w0 , w1 ) = Y l (w0 + w1 X1l ) − ln(1 + exp(w0 + w1 X1l ))
l=1 l=1

The training rule for (w00 , w10 , w20 ) is to maximize:

L
Y L
X
ln P (Y l
|X1l , X2l , w00 , w10 , w20 ) = Y l (w00 + w10 X1l + w20 X2l ) − ln(1 + exp(w00 + w10 X1l + w20 X2l ))
l=1 l=1

Since X2 is a duplication of X1 , the training rule for (w00 , w10 , w20 ) becomes maximizing:
L
X
Y l (w00 + w10 X1l + w20 X1l ) − ln(1 + exp(w00 + w10 X1l + w20 X1l ))
l=1
XL
= Y l (w00 + (w10 + w20 )X1l ) − ln(1 + exp(w00 + (w10 + w20 )X1l ))
l=1

14
which is basically the same as the training rule for (w0 , w1 ), with substitution w0 = w00 and
w1 = w10 + w20 . This is also the relationship between (w0 , w1 ) and (w00 , w10 , w20 ) we estimated
from D1 and D2 . As a result, logistic regression will simply split the weight w1 into w10 +w20 = w1
when facing duplicated variable X2 = X1 .

15
6 [Extra Credit 6 pts] Violated assumptions (Grading:
Carl Doersch)
Extra Credit Question: This question is optional – do not attempt it until you have
completed the rest of the exam. It will not affect the grade curve for the exam, though you
will receive extra points if you answer it.

Let A, B, and C be boolean random variables governed by the joint distribution P (A, B, C).
Let D be a dataset consisting of n data points, each of which is an independent draw from
P (A, B, C), where all three variables are fully observed.
Consider the following Bayes Net, which does not necessarily capture the correct condi-
tional independencies in P (A, B, C).

Let P̂ be the distribution learned after this Bayes net is trained using D. Show that for
any number , 0 < ≤ 1, there exists a joint distribution P (A, B, C) such that P (C = 1|A =
1) = 1, but such that the Bayes net shown above, when trained on D, will (with probability
1) learn CPTs where:
X
P̂ (C = 1|A = 1) = P̂ (C = 1|B = b)P̂ (B = b|A = 1) ≤
b∈{0,1}

as |D| approaches ∞. Assume that the Bayes net is learning on the basis of the MLE.
You should solve this problem by defining a distribution with the above property. Your
final solution may be either in the form of a fully specified joint distribution (i.e. you write
out the probabilities for each assignment of the variables A, B, and C), or in the form of a
Bayes net with fully specified CPTs. (Hint: the second option is easier.)

F SOLUTION:

Let P (A = 1) = , P (C = 1|A = 1) = 1, P (C = 1|A = 0) = 0. P (B) can be any arbitrary

value, as for all values of B, the Bayes net will estimate P (C = 1|B = b) = .

Quiz Q & A PDF
0% (2)
Quiz Q & A PDF
18 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
Free Cmyk Chart Printable Download US Letter
No ratings yet
Free Cmyk Chart Printable Download US Letter
1 page
Electronic Medical Record
100% (1)
Electronic Medical Record
10 pages
ML Assignment 2 2019 Nptel
No ratings yet
ML Assignment 2 2019 Nptel
34 pages
Questions and Solutions On Bayes Theorem
No ratings yet
Questions and Solutions On Bayes Theorem
10 pages
Advanced Rougher Flotation Control
100% (1)
Advanced Rougher Flotation Control
14 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
Thesis Example Chapter 4
100% (3)
Thesis Example Chapter 4
4 pages
DLP Ict 8 Q1 M1
No ratings yet
DLP Ict 8 Q1 M1
5 pages
Midterm Sp16 Solutions
100% (1)
Midterm Sp16 Solutions
17 pages
Adspower Script
No ratings yet
Adspower Script
3 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Final 2001f
No ratings yet
Final 2001f
18 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Data Structure and Linked List
No ratings yet
Data Structure and Linked List
33 pages
Unit Iii ML MCQ
100% (1)
Unit Iii ML MCQ
7 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Program: Computer Engineering Course-Web Development Using PHP (22619) Practical Answer Sheet
100% (1)
Program: Computer Engineering Course-Web Development Using PHP (22619) Practical Answer Sheet
4 pages
Module 4
No ratings yet
Module 4
68 pages
Early Flood Monitoring System Using Iot Applications: A Mini Project Report
No ratings yet
Early Flood Monitoring System Using Iot Applications: A Mini Project Report
19 pages
05 Classification II 2024
No ratings yet
05 Classification II 2024
54 pages
Report Header: Message 1
No ratings yet
Report Header: Message 1
2 pages
Slide of PSpice
No ratings yet
Slide of PSpice
63 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
CAD Tutorials
No ratings yet
CAD Tutorials
18 pages
Exam2 Review Solutions
No ratings yet
Exam2 Review Solutions
18 pages
Final 2019
No ratings yet
Final 2019
15 pages
Week 8
No ratings yet
Week 8
31 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
AKANKSHA START PAGE - Merged
No ratings yet
AKANKSHA START PAGE - Merged
51 pages
ML Assignment Week 4 2019 Nptel
No ratings yet
ML Assignment Week 4 2019 Nptel
30 pages
Office Manager - Admin Assistant Job Description1
No ratings yet
Office Manager - Admin Assistant Job Description1
2 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Samp Sol
No ratings yet
Samp Sol
14 pages
PR Digital Readouts Linear Encoders ID208864 en
No ratings yet
PR Digital Readouts Linear Encoders ID208864 en
19 pages
Chapter 4A Tutorial Questions and Solutions
No ratings yet
Chapter 4A Tutorial Questions and Solutions
12 pages
Statistical Methods For ML
No ratings yet
Statistical Methods For ML
24 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
CNET101 Computer Networks
No ratings yet
CNET101 Computer Networks
3 pages
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
No ratings yet
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
25 pages
Final 2006
No ratings yet
Final 2006
15 pages
UMBC CMSC 671 Final Exam: December 20, 2009
No ratings yet
UMBC CMSC 671 Final Exam: December 20, 2009
8 pages
Final 2006
No ratings yet
Final 2006
15 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
Quiz2 B
No ratings yet
Quiz2 B
6 pages
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
No ratings yet
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
20 pages
UMBC CMSC 671 Final Exam: 13 December 2012
No ratings yet
UMBC CMSC 671 Final Exam: 13 December 2012
7 pages
Division Memorandum No. 579, S. 2024
No ratings yet
Division Memorandum No. 579, S. 2024
5 pages
CS725 2021 Midsem
No ratings yet
CS725 2021 Midsem
6 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Programmable Machine Pre History - MechMachTheor - May2001
No ratings yet
Programmable Machine Pre History - MechMachTheor - May2001
15 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
User Manual 2 2946010
No ratings yet
User Manual 2 2946010
25 pages
12f-601-Midterm Machine Learning
No ratings yet
12f-601-Midterm Machine Learning
21 pages
Midterm 2002
No ratings yet
Midterm 2002
10 pages
Student Attendance Tracking Information System by Maureen
No ratings yet
Student Attendance Tracking Information System by Maureen
27 pages
SQL Server Logs Extended Events Audits
No ratings yet
SQL Server Logs Extended Events Audits
2 pages
Physio@Home: Exploring Visual Guidance and Feedback Techniques For Physiotherapy Exercises
No ratings yet
Physio@Home: Exploring Visual Guidance and Feedback Techniques For Physiotherapy Exercises
10 pages
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
No ratings yet
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
16 pages
Chat GPT
No ratings yet
Chat GPT
2 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
MLRECT2 Solution
No ratings yet
MLRECT2 Solution
9 pages
चरित्र प्रमाण पत्र - PDF
No ratings yet
चरित्र प्रमाण पत्र - PDF
6 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
I-K-Means and Clustering
No ratings yet
I-K-Means and Clustering
6 pages
ML End Sem Nov2024 Paper
No ratings yet
ML End Sem Nov2024 Paper
4 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
Assignment Week 4
No ratings yet
Assignment Week 4
6 pages
UMBC CMSC 471 Final Exam,: 1. True/False (20 Points)
No ratings yet
UMBC CMSC 471 Final Exam,: 1. True/False (20 Points)
6 pages
15-381 Artificial Intelligence: Representation and Problem Solving Homework 2 - Solutions
No ratings yet
15-381 Artificial Intelligence: Representation and Problem Solving Homework 2 - Solutions
7 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Beginner's Guide To Make A Game Controller
No ratings yet
Beginner's Guide To Make A Game Controller
23 pages
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
No ratings yet
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
4 pages
Hw3 Solutions
No ratings yet
Hw3 Solutions
7 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
601 sp09 Midterm Solutions
No ratings yet
601 sp09 Midterm Solutions
14 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Message
No ratings yet
Message
2 pages
Sample Questions Answers
No ratings yet
Sample Questions Answers
8 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Practice Final CS61c
No ratings yet
Practice Final CS61c
19 pages
Midterm Solution
No ratings yet
Midterm Solution
6 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
3 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
HESI A2 Math Practice Tests: HESI A2 Nursing Entrance Exam Math Study Guide
From Everand
HESI A2 Math Practice Tests: HESI A2 Nursing Entrance Exam Math Study Guide
Exam SAM
No ratings yet

Midterm Sol

Uploaded by

Midterm Sol

Uploaded by

10-701 Midterm Exam, Spring 2011

4. No computers, PDAs, phones or Internet access.

6. Work efficiently. Consider answering all of the easier questions first.

Question Topic Max. score Score

3 Decision Surfaces and Training Rules 12

5 Conditional Independence Violation 25

6 [Extra Credit] Violated Assumptions 6

F SOLUTION: False. There will still be unavoidable error. In Naive Bayes, Y is

F SOLUTION: False, for the same reasons as above.

F SOLUTION: True, by the definition of Bayes Net.

Consider the three Bayes Nets shown below:

F SOLUTION: False. A can represent distributions where X1 can depend on X3

1.2 Quick questions (Grading: Yi Zhang)

F SOLUTION: P (X1 |X2 )P (X2 ) = P (X1 , X2 ) = P (X2 |X1 )P (X1 )

F SOLUTION: 8: 1 for A, 1 for B, 2 for C, and 4 for D.

2. [2 pts] Write an expression for the probability P (A = 1, B = 1, C = 1, D = 1) in

P (A = 1)P (B = 1)P (C = 1|A = 1)P (D = 1|A = 1, B = 1)

3. [3 pts] Write an expression for P (A = 0|B = 1, C = 1, D = 1) in terms of the Bayes

P (A = 0)P (B = 1)P (C = 1|A = 0)P (D = 1|A = 0, B = 1)

P (A = 0)P (B = 1)P (C = 1|A = 0)P (D = 1|A = 0, B = 1)+

P (A = 1)P (B = 1)P (C = 1|A = 1)P (D = 1|A = 1, B = 1)

4. [2 pts] True or False (give brief justification): C is conditionally independent of B

5. [2 pts] True or False (give brief justification): C is conditionally independent of B

F SOLUTION: True. The path is now blocked at both A and D.

F SOLUTION: 5 or 8, depending on your interpretation. In the M step we update

F SOLUTION: 5. Every unknown value must be estimated.

Figure 1: Two positive examples and two negative examples.

F SOLUTION: P (Y = 1) = 0.5 (= P (Y = 0))

P (Ŷ (X1 , X2 ) 6= Y ) = P (Ŷ (X1 , X2 ) 6= Y, X1 = X2 = 0) + P (Ŷ (X1 , X2 ) 6= Y, X1 = X2 = 1)

5.3 Logistic Regression with Conditional Independence Violation

Now we will go beyond the intuition. We have a training set D1 of L examples D1 =

The training rule for (w00 , w10 , w20 ) is to maximize:

Let P (A = 1) = , P (C = 1|A = 1) = 1, P (C = 1|A = 0) = 0. P (B) can be any arbitrary

You might also like

Let P (A = 1) = , P (C = 1|A = 1) = 1, P (C = 1|A = 0) = 0. P (B) can be any arbitrary