0% found this document useful (0 votes)

30 views12 pages

Midterm2008f Sol

Uploaded by

Precious Zemoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views12 pages

Midterm2008f Sol

Uploaded by

Precious Zemoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning (10-701) Fall 2008

Midterm
Professor: Eric Xing Date: October 20, 2008

. There are 6 questions in this exam (11 pages including this cover sheet)

. Questions are not equally difficult.

. This exam is open to book and notes. Computers, PDAs, Cell phones are not allowed.

. You have one hour and twenty minutes.

. Good luck!

1
1 Assorted Questions [24 points]

Please explain your answer in a single line for all True/False questions

1. (True or False, 2 pts) E(X + Y ) = E(X) + E(Y ) holds for any two random variables X, Y .

Solutions: T

2. (True or False, 2 pts) V ar(X + Y ) = V ar(X) + V ar(Y ) holds for any two random variables
X, Y .

Solutions: F (consider Y = −X)

3. (True or False, 2 pts) E(XY ) = E(X) · E(Y ) holds for any two random variables X, Y .
Solutions: F ( if X and Y are independent with each other).

4. (True or False, 2 pts) Logistic regression is always a linear classifier.

Solutions: F (multiple classes)

5. (True or False, 2 pts) In any finite space, if we are given 1 positive example and 1 negative
example as the training data set, then 1-NN (using L2 distance) is always a linear classifier.

Solutions: T

6. (True or False, 2 pts) For binary Naive Bayes classifier, if P (xi |y), (i = 1, ..., d; y = ±1)
follows Gaussian distributions, then the resulting classifier is always a linear classifier. Re-
member that xi is the ith dimensional feature; d is the dimension of the feature vector; and
y is the class label.

Solutions: F (same variance for P (xi |y = 1) and P (xi |y = 0))

7. (True or False, 2 pts) Both bagging and boosting perform re-sampling on the training set.

Solutions: T

8. (True or False, 2 pts) For small k, a k-nearest neighbor classifier has small variance.

Solutions: F (high variance)

9. (True or False, 3 pts) The expected extra number of bits required to send a code, if an
optimal code for the (wrong) distribution q is used, instead of the optimal code for the true
distribution p, is given by KL(q||p).

Solutions: F (should be KL(pkq))

10. (5 pts) Given 4 positive examples: (0.707, 0.707), (0.707, −0.707), (−0.707, 0.707), (−0.707, −0.707)
and 4 negative examples: (3, 0), (5, 0), (4, −1), (4, 1) in 2-d space (see Fig. 1.(a)), we will use
each of the following algorithms to learn a classifier from this training data set:

(a) Naive Bayes, assuming that we have the same variance (i.e., Var(x1 |y = 0) = Var(x1 |y =
1) = Var(x2 |y = 0) = Var(x2 |y = 1) and the same class prior (i.e., P (y = 0) = P (y = 1)).
(b) Linear SVM with hard margin.

2
(a) training data set (b)

Figure 1: Naive Bayes, SVM and Logistic Regression

(c) Logistic regression.

From the Fig. 1.(b)-(d),

(a) Which figure(s) correspond(s) to Naive Bayes?

Solutions: d
(b) Which figure(s) correspond(s) to SVM?
Solutions: c
(c) Which figure(s) correspond(s) to logistic regression?
Solutions: b-d

hints: for each classifier, there might be more than one possible corresponding figures.

3
2 Linear Regression [16 Points]

Consider the regression problem where the two dimensional input points x = [x1 , x2 ]T are con-
strained to lie within the unit square: xi ∈ [−1, 1], i = 1, 2. The training and test input points x
are sampled uniformly at random within the unit square. The target outputs y are governed by
the following model
y ∼ N (x51 x52 + 20x21 − 30x1 x2 + 5x1 − 1, 1)
We learn to predict y given x using linear regression models with 1st through 12th order polynomial
features. The models are nested in the sense that the higher order models will include all the lower
order features. The estimation criterion is the mean squared error. We first train a 1st, 2nd, 10th,
and 12th order model using n = 20 training points, and then test the predictions on a large number
of independently sampled points.
(Hint 1: Examples of 10th order polynomial features are x10 9 8 2
1 , x1 x2 , x1 x2 , . . . .)
(Hint 2: A 1st order model involves features x1 , x2 , and 1. A 2nd order model involves features
x21 , x1 x2 , x22 , x1, x2, and 1. Similarly, it can be generalized to higher order models.)

1. (4 pts) Which model would get the lowest training error?

Solution: The 12th order model. On training error, a higher order model can do no worse
than a lower order model, because it has an extended feature set. The most expressive model
gets the lowest training error. (Explanation is not required in exam.)

2. (4 pts) Which model would get the highest training error?

Solution: The 1st order model. The least expressive model gets the highest training error.
(Explanation is not required in exam.)

3. (4 pts) If we have xi ∈ [−10, 10] instead of xi ∈ [−1, 1], which model is likely to get a lowest
test error? Use one sentence to explain your selection.

Solution: The 10th order model, because it is the closest match to the ground truth model.
Lower order models can not capture the x51 x52 term which is dominant in this case (high bias).
Higher order models are likely to overfit which will result in higher test error (high variance).
The 10th order model has 0 bias and reasonable variance, and is likely to give the lowest test
error.

4. (4 pts) Go back to the case where xi ∈ [−1, 1]. Among the 2nd, 10th, and 12th order model,
which one would typically get a lowest test error now? Briefly explain your selection.

Solution: The 2nd order model. The term x51 x52 is small, so the 2nd order model is a good
approximation to the ground truth model. On the other hand, the small first term, x51 x52 , is
unlikely to be distinguished from noise, so the 10th order model is more likely to overfit the
data than the 2nd order model. A 12th order model is even worse. In short, the 2nd order
model has the lowest variance and a small bias, and is likely to give the lowest test error.

4
3 Support Vector Machine for Regression (SVR) [15 pts]

Using the similar idea for classification, we can also use support vector machine for regression
(referred to as ‘SVR’). In SVR, we are given m training examples (x1 , y1 ), (x2 , y2 ), ...(xm , ym ), where
xi is the feature vector and yi ∈ R is the output for the ith example, respectively(i = 1, 2, ..., m).
In the linear SVR, we aim to learn a linear function f from the training examples, which takes
the following form: f = w′ x + b, where w and b are the parameters to be learned, and w′ is the
transpose of the weight vector w .
In the simplest case for linear SVR, we can formulate it as the following convex optimization
problem:
1
minimize : kwk2
2
subject to : yi − (w′ xi + b) ≤ ǫ; and (w′ xi + b) − yi ≤ ǫ (i = 1, 2, ...m) (1)

The intuition of eq. (1) is that we want to learn the parameters w and b so that (1) the resulting
f is as smooth as possible (i.e., small kwk); and (2) for each training example, the prediction error
by f is at most ǫ, where ǫ ≥ 0 is a given parameter in the algorithm. Notice that the bigger ǫ is,
the more smooth function f we are looking for.

Now, given 3 training examples (1, 1), (2, 2), (3, 3) (See Fig. 3), suppose that we want to use eq. (1)
to learn a linear SVR.

Figure 2: Linear Support Vector Regression.

(1) (5 pts) What is w if ǫ = 0?

Solutions: w = 1

(2) (5 pts) What is w if ǫ = 0.5?

Solutions: w = 0.5

(3) (5 pts) What is w if ǫ = 1?

Solutions: w = 0

5
4 Neural Networks [20 Points]

Here is a simple 2-layer neural network with 2 hidden units and a single output unit.

P
Consider the linear activation function y = C · a = C · i wi xi where C is a constant multiplied by
a which is the weighted sum of its inputs. Also, consider the non-linear logistic activation function
y = σ(a) where
1
σ(a) =
1 + e−a

1. (a) (3 pts) Assume all units are linear. Can this 2-layer network represent decision bound-
aries that a standard regression model y = b0 + b1 x1 + b2 x2 + ǫ cannot?
Solution:
No. A network with all linear units can be reduced to a simple linear model.

(b) (3 pts) Assume the hidden units use logistic activation functions and the output unit
uses a linear activation unit. Can this network represent non-linear decision boundaries?
Solution:
Yes.

(c) (4 pts) Using logistic activation functions for both hidden and output units, it is possible
to approximate any complicated decision surface by combining many piecewise linear de-
cision boundaries. Explain what changes you would need to make to the above network
so you could approximate any decision boundary.
Solution:
Yes, you need additional hidden units. More hidden units lead to a more complicated
decision surface.

2. (10 pts) Consider the XOR function: y = (x1 ∧ ¬ x2 ) ∨ (x2 ∧ ¬ x1 ). Assume all units are
logistic. We can implement the XOR function using the two layer network above and the
decision rule:
y > 1/2 then 1
y < 1/2 then 0

Select the weights that implement (x1 XOR x2 ). Hint: There are many solutions, but a
simple one is where all weights come from the set {−10, 10, 100}.

6
Solution:
w10 = -10
w20 = -10
w0 = -10
w11 = 100
w12 = -100
w21 = -100
w22 = 100
w1 = 100
w2 = 100

7
5 AdaBoost

Given N examples (xi , yi ), where yi is the label and yi = +1 or yi = −1. Let I(·) be the indicator
function, which is 1 if the condition in () is true and 0 otherwise. In this problem, we use the
following version for AdaBoost algorithm:

1. Initialize wi1 = 1/N (i = 1, ..., N )

2. For t = 1, ..., M ,

a. P
Learn a weak classifier ht (x) by minimizing the weighed error function Gt , where Gt =
N t
i=1 wi I(ht (xi ) 6= yi );
b. Compute the error rate for the learnt weak classifier ht (x): ǫt = N t
P
i=1 wi I(ht (xi ) 6= yi );
c. Compute the weight for ht (x): αt = 12 ln 1−ǫ
ǫt ;
t

w t exp{−α y ht (xi )}
d. Update the weight for each example: wit+1 = i Zt
t i
, where Zt is the normal-
ization factor for wit+1 : Zt = N t
P
i=1 wi exp{−αt yi ht (xi )}.

P = sign(fM (x)), where fM (x) is a linear combination of the

3. Output the final classifier: H(x)
weak classifiers, i.e., fM (x) = M
t=1 αt ht (x).

5.1 Minimzing a squared loss objective function

In AdaBoost, we tried to minimize the negative exponential loss E = N

P
i=1 exp{−yi fM (xi )} se-
quentially. That is to say, at the mth (1 ≤ m ≤ M )iteration, we want to choose appropriate weight
αm and the corresponding weak classifier hm (x) so that the overall loss E (accumulated up to mth
iteration ) is minimized.
Now, suppose we change the objective function E to E = N 2
P
i=1 (yi − fM (xi )) and we still want to
optimize it sequentially. What is the new update rule for αm ?
hints: you can use the fact that wim ∝ exp{−yi fm−1 (xi )}, which can be viewed as constant when
we try to optimize αm in the mth iteration.
Solution: We will compute the derivative of E = N 2
P
i=1 (yi − fm (xi )) wrt αm and set it to zero to
find the value of αm .

∂ N 2
P
∂E i=1 (yi − fm (xi ))
= (2)
∂αm ∂αm
N
X ∂(yi − fm (xi ))
= 2(yi − fm (xi )) (3)
∂αm
i=1

We also know that

fm (xi ) = αm hm (xi ) + fm−1 (xi ) (4)

8
In this equation, fm−1 is independent of αm . Substituting this in the derivative equation, we get
N
∂E X ∂(yi − fm (xi ))
= 2(yi − fm (xi )) (5)
∂αm ∂αm
i=1
N
X ∂(yi − αm hm (xi ) − fm−1 (xi ))
= 2(yi − fm (xi )) (6)
∂αm
i=1
XN
= 2(yi − fm (xi ))(−hm (xi )) (7)
i=1

Setting the derivative to zero, we get

N
X
0 = 2(yi − αm hm (xi ) − fm−1 (xi ))(−hm (xi )) (8)
i=1
N
X XN
αm h2m (xi ) = (yi − fm−1 (xi ))hm (xi ) (9)
i=1 i=1

Substituting h2m (xi ) = 1, we get

PN
i=1 (yi − fm−1 (xi ))hm (xi )
αm = (10)
N

5.2 Training error bounds

In Homework 3, we proved that the training error ǫtraining of AdaBoost is upper bounded by
QM 1 PN
t=1 Zt , where the training error ǫtraining = N i=1 I(H(xi ) 6= yi ). We will now modify this
bound to rewrite it in terms of the error rate ǫt of the weak classifiers.
Prove that Zt = 2 ǫt (1 − ǫt ) and hence the training error ǫtraining is upper bounded by M
p Q p
t=1 [2 ǫt (1 − ǫt )].
Show all relevant steps.
Solution:
N
X
Zt = wit exp{−αt yi ht (xi )} (11)
i=1
N
X
= wit exp{−αt yi ht (xi )}(I[yi = ht (xi )] + I[yi 6= ht (xi )]) (12)
i=1
XN N
X
= wit exp{−αt }I[yi = ht (xi )] + wit exp{αt }I[yi 6= ht (xi )] (13)
i=1 i=1
= exp{−αt }(1 − ǫt ) + exp{αt }ǫt (14)
Substituting αt = 12 ln 1−ǫ
ǫt , we get
t

Zt = exp{−αt }(1 − ǫt ) + exp{αt }ǫt (15)

r r
ǫt 1 − ǫt
= (1 − ǫt ) + ǫt (16)
1 − ǫt ǫt
p
Zt = 2 ǫt (1 − ǫt ) (17)

9
QM p
Since this equality holds for all t, we get that ǫtraining is upper bounded by t=1 [2 ǫt (1 − ǫt )].

10
6 VC Dimension and Hypothesis Spaces [15 Points]

Consider a collection of N points lying in K dimensional space ℜK . For each point xk , we can
assign some label y ∈ {0, 1}. Thus for N points, there are 2N possible labelings of the data.
Assuming we have some weight wk for each of the K inputs, we can classify a point using a linear
threshold function:
X K
y=f wk xk
k=1
where
1 if a > 0
f (a) =
0 if a ≤ 0

Note that there is no bias term w0 .

1. (4 pts) Assuming K = 1, what is the VC-Dimension of f (a)?

Solution:
VC = 1

2. (4 pts) Assuming K = 1, how many different labelings (i.e. hypotheses) of N points can be
realized by changing the weights in f (a) ?

Solution:
2

3. (4 pts) Assuming K = 2, what is the VC-Dimension of f (a)?

Solution:
VC=2

4. (3 pts) Assuming K = 2, how many different labelings (i.e. hypotheses) of N points can be
realized by changing the weights in f (a) ? You can assume no two points exist on a line that
passes through the origin. Hint: Your answer should be in terms of N .
Solution:
2·N

11
12

FIITJEE Sample Papers Class VI
92% (13)
FIITJEE Sample Papers Class VI
24 pages
CSEC - Add Math - Paper 2 Booklet (2016-2023)
No ratings yet
CSEC - Add Math - Paper 2 Booklet (2016-2023)
151 pages
Complexity A Guided Tour
100% (11)
Complexity A Guided Tour
366 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
ML Midsem 2022
No ratings yet
ML Midsem 2022
8 pages
Fiitjee Question Paper Solutions
No ratings yet
Fiitjee Question Paper Solutions
37 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Astm C403-99 PDF
No ratings yet
Astm C403-99 PDF
6 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
Final 2019
No ratings yet
Final 2019
15 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
0% (2)
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
186 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
1st Exam Question Paper 2
No ratings yet
1st Exam Question Paper 2
16 pages
Final f04
No ratings yet
Final f04
13 pages
Finals 19
No ratings yet
Finals 19
16 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Fun Least Squares
No ratings yet
Fun Least Squares
3 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
Exam 21
No ratings yet
Exam 21
17 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
CSCI 5521 Spring 2025 Final Exam
No ratings yet
CSCI 5521 Spring 2025 Final Exam
8 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Dda3020 22
No ratings yet
Dda3020 22
4 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
Sample Exam Answers
No ratings yet
Sample Exam Answers
6 pages
Analysis in Adams
No ratings yet
Analysis in Adams
5 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
Bucketwheel Stacker Reclaimers - Part1
No ratings yet
Bucketwheel Stacker Reclaimers - Part1
10 pages
Manual Moisture
No ratings yet
Manual Moisture
38 pages
The Incident at Antioch, by Alain Badiou
100% (1)
The Incident at Antioch, by Alain Badiou
32 pages
June 2016 Paper
No ratings yet
June 2016 Paper
20 pages
Cloud Hypothesis
No ratings yet
Cloud Hypothesis
17 pages
Polarization Through Quarter
No ratings yet
Polarization Through Quarter
10 pages
Grade 4 Mathematics Term 4 Mock Exam: Place Value
0% (1)
Grade 4 Mathematics Term 4 Mock Exam: Place Value
4 pages
A Brief History of Feedback Control
No ratings yet
A Brief History of Feedback Control
20 pages
02 Chapter 3 - Weight Volume Relationships
No ratings yet
02 Chapter 3 - Weight Volume Relationships
42 pages
2022-23 Maths IA
No ratings yet
2022-23 Maths IA
22 pages
Inventory Schedule
No ratings yet
Inventory Schedule
41 pages
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
No ratings yet
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
16 pages
Beamer NEW 091023
No ratings yet
Beamer NEW 091023
224 pages
Free Damped Vibration
No ratings yet
Free Damped Vibration
50 pages
Osborne (2008) CH 22 Testing The Assumptions of Analysis of Variance
No ratings yet
Osborne (2008) CH 22 Testing The Assumptions of Analysis of Variance
29 pages
Lyceum of Alabang Basic Education
No ratings yet
Lyceum of Alabang Basic Education
41 pages
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
No ratings yet
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
15 pages
MATH 472: Numerical Methods With Financial Applications: Course Basics Fundamentals
No ratings yet
MATH 472: Numerical Methods With Financial Applications: Course Basics Fundamentals
38 pages
Math Tessellation Final Project
No ratings yet
Math Tessellation Final Project
8 pages
CIA 3 Assignment 4 Unit 2 PDF
No ratings yet
CIA 3 Assignment 4 Unit 2 PDF
8 pages
Day 1 August 24 - Grade 8
No ratings yet
Day 1 August 24 - Grade 8
4 pages
Gretl Empirical Exercise 2 - KEY PDF
No ratings yet
Gretl Empirical Exercise 2 - KEY PDF
3 pages
Nassim Taleb 5% P-Values
No ratings yet
Nassim Taleb 5% P-Values
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Midterm2008f Sol

Uploaded by

Midterm2008f Sol

Uploaded by

Machine Learning (10-701) Fall 2008

. Questions are not equally difficult.

. You have one hour and twenty minutes.

Solutions: F (consider Y = −X)

4. (True or False, 2 pts) Logistic regression is always a linear classifier.

Solutions: F (multiple classes)

Solutions: F (same variance for P (xi |y = 1) and P (xi |y = 0))

Solutions: F (high variance)

Solutions: F (should be KL(pkq))

Figure 1: Naive Bayes, SVM and Logistic Regression

(c) Logistic regression.

From the Fig. 1.(b)-(d),

(a) Which figure(s) correspond(s) to Naive Bayes?

1. (4 pts) Which model would get the lowest training error?

2. (4 pts) Which model would get the highest training error?

Figure 2: Linear Support Vector Regression.

(1) (5 pts) What is w if ǫ = 0?

(2) (5 pts) What is w if ǫ = 0.5?

(3) (5 pts) What is w if ǫ = 1?

1. Initialize wi1 = 1/N (i = 1, ..., N )

P = sign(fM (x)), where fM (x) is a linear combination of the

5.1 Minimzing a squared loss objective function

In AdaBoost, we tried to minimize the negative exponential loss E = N

We also know that

Setting the derivative to zero, we get

Substituting h2m (xi ) = 1, we get

5.2 Training error bounds

Zt = exp{−αt }(1 − ǫt ) + exp{αt }ǫt (15)

Note that there is no bias term w0 .

1. (4 pts) Assuming K = 1, what is the VC-Dimension of f (a)?

3. (4 pts) Assuming K = 2, what is the VC-Dimension of f (a)?

You might also like