0% found this document useful (0 votes)

92 views17 pages

07au Midterm

The hinge loss and logistic loss functions are appropriate for classification, while squared loss and absolute loss are not. An appropriate loss function for classification must be bounded, monotonically increasing with respect to the margin yF(x), and convex.

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views17 pages

07au Midterm

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

10-701 Midterm Exam, Fall 2007

1. Personal info:

• Name:
• Andrew account:
• E-mail address:

2. There should be 17 numbered pages in this exam (including this cover sheet).

3. You can use any material you brought: any book, class notes, your print outs of class
materials that are on the class website, including my annotated slides and relevant
readings, and Andrew Moore’s tutorials. You cannot use materials brought by other
students. Calculators are not necessary. Laptops, PDAs, phones and Internet access
are not allowed.

4. If you need more room to work out your answer to a question, use the back of the page
and clearly mark on the front of the page if we are to look at what’s on the back.

5. Work efficiently. Some questions are easier, some more difficult. Be sure to give yourself
time to answer all of the easy ones, and avoid getting bogged down in the more difficult
ones before you have answered the easier ones.

6. Note there are extra-credit sub-questions. The grade curve will be made without
considering students’ extra credit points. The extra credit will then be used to try to
bump your grade up without affecting anyone else’s grade.

7. You have 80 minutes.

8. Good luck!

Question Topic Max. score Score

1 Short questions 20 + 0.1010 extra
2 Loss Functions 12
3 Kernel Regression 12
4 Model Selection 14
5 Support Vector Machine 12
6 Decision Trees and Ensemble Methods 30

1
1 [20 Points] Short Questions
The following short questions should be answered with at most two sentences, and/or a
picture. For yes/no questions, make sure to provide a short justification.

1. [2 point] Does a 2-class Gaussian Naive Bayes classifier with parameters µ1k , σ1k , µ2k ,
σ2k for attributes k = 1, ..., m have exactly the same representational power as logistic
regression (i.e., a linear decision boundary), given no assumptions about the variance
2
values σik ?

2. [2 points] For linearly separable data, can a small slack penalty (“C”) hurt the training
accuracy when using a linear SVM (no kernel)? If so, explain how. If not, why not?

3. [3 points] Consider running AdaBoost with Multinomial Naive Bayes as the weak
learner for two classes and k binary features. After t iterations, of AdaBoost, how
many parameters do you need to remember? In other words, how many numbers do
you need to keep around to predict the label of a new example? Assume that the
weak-learner training error is non-zero at iteration t. Don’t forget to mention where
the parameters come from.

4. [2 points] In boosting, would you stop the iteration if the following happens? Justify
your answer with at most two sentences each question.

• The error rate of the combined classifier on the original training data is 0.

2
• The error rate of the current weak classifier on the weighted training data is 0.

5. [4 points] Given n linearly independent feature vectors in n dimensions, show that

for any assignment to the binary labels you can always construct a linear classifier
with weight vector w which separates the points. Assume that the classifier has the
form sign(w · x). Note that a square matrix composed of linearly independent rows is
invertible.

6. [3 points] Construct a one dimensional classification dataset for which the Leave-one-
out cross validation error of the One Nearest Neighbors algorithm is always 1. Stated
another way, the One Nearest Neighbor algorithm never correctly predicts the held out
point.

7. [2 points] Would we expect that running AdaBoost using the ID3 decision tree learning
algorithm (without pruning) as the weak learning algorithm would have a better true
error rate than running ID3 alone (i.e., without boosting (also without pruning))?
Explain.

3
8. [1 point] Suppose there is a coin with unknown bias p. Does there exist some value of p
for which we would expect the maximum a-posteriori estimate of p, using a Beta(4, 2)
prior, to require more coin flips before it is close to the true value of p, compared to
the number of flips required of the maximum likelihood estimate of p? Explain.
(The Beta(4, 2) distribution is given in the figure below.)
HΒH =4., ΒT =2.L
pHΘL

2.0

1.5

1.0

0.5

0.0 Θ
0.0 0.2 0.4 0.6 0.8 1.0

Figure 1: Beta(4, 2) distribution

9. [1 point] Suppose there is a coin with unknown bias p. Does there exist some value
of p for which we would expect the maximum a-posteriori estimate of p, using a
U nif orm([0, 1]) prior, to require more coin flips before it is close to the true value
of p, compared to the number of flips required of the maximum likelihood estimate of
p? Explain.

10. [0.1010 extra credit] Can a linear classifier separate the positive from the negative
examples in the dataset below? Justify.

Colbert
U2
f or
Loosing my religion
president

T he Beatles
N irvana
T here is a season...
Grunge
T urn! T urn! T urn!

4
2 [12 points] Loss Function
Generally speaking, a classifier can be written as H(x) = sign(F (x)), where H(x) : Rd →
{−1, 1} and F (x) : Rd → R. To obtain the P parameters in F (x), we need to minimize the
i i
loss function averaged over the training set: i L(y F (x )). Here L is a function of yF (x).
For example, for linear classifiers, F (x) = w0 + dj=1 wj xj , and yF (x) = y(w0 + dj=1 wj xj )
P P

1. [4 points] Which loss functions below are appropriate to use in classification? For the
ones that are not appropriate, explain why not. In general, what conditions does L
have to satisfy in order to be an appropriate loss function? The x axis is yF (x), and
the y axis is L(yF (x)).

12 1 1

0.9 0.9
10
0.8 0.8

0.7 0.7
8
0.6 0.6

6 0.5 0.5

0.4 0.4
4
0.3 0.3

0.2 0.2
2
0.1 0.1

0 0 0
!10 !5 0 5 10 !10 !5 0 5 10 !10 !5 0 5 10

(a) (b) (c)

12 !

"$+
10
"$*

"$)
8
"$(

6 "$#

"$'
4
"$&

"$%
2
"$!

0 "
!10 !5 0 5 10 !!" !# " # !"

(d) (e)

5
2. [4 points] Of the above loss functions appropriate to use in classification, which one is
the most robust to outliers? Justify your answer.

3. [4 points] Let F (x) = w0 + dj=1 wj xj and L(yF (x)) = 1+exp(yF

1
P
(x))
. Suppose you use
gradient descent to obtain the optimal parameters w0 and wj . Give the update rules
for these parameters.

6
3 [12 points] Kernel Regression, k-NN
1. [4 points] Sketch the fit Y given X for the dataset given below using kernel regression
with a box kernel

1 if − h ≤ xi − xj < h
K(xi , xj ) = I(−h ≤ xi − xj < h) =
0 otherwise

for h = 0.5, 2.

• h = 0.5
4

3.5

2.5

2
y

1.5

0.5

0
0 1 2 3 4 5 6
x

• h=2
4

3.5

2.5

2
y

1.5

0.5

0
0 1 2 3 4 5 6
x

7
2. [4 points] Sketch or describe a dataset where kernel regression with the box kernel
above with h = 0.5 gives the same regression values as 1-NN but not as 2-NN in the
domain x ∈ [0, 6] below.

4.5

3.5

2.5

1.5

0.5

0
0 1 2 3 4 5 6

3. [4 points] Sketch or describe a dataset where kernel regression with the box kernel
above with h = 0.5 gives the same regression values as 2-NN but not as 1-NN in the
domain x ∈ (0, 6) below.

4.5

3.5

2.5

1.5

0.5

0
0 1 2 3 4 5 6

8
4 [14 Points] Model Selection
A central theme in machine learning is model selection. In this problem you will have the
opportunity to demonstrate your understanding of various model selection techniques and
their consequences. To make things more concrete we will consider the dataset D given in ??
consisting of n independent identically distributed observations. The features of D consist
of pairs (xi1 , xi2 ) ∈ R2 and the observations y i ∈ R are continuous valued.

D = {((x11 , x12 ), y 1 ), ((x21 , x22 ), y 2 ), . . . , ((xn1 , xn2 ), y n )} (1)

Consider the abstract model given ??. The function fθ1 ,θ2 is a mapping from the features in
R2 to an observation in R1 which depends on two parameters θ1 and θ2 . The i correspond
to the noise. Here we will assume that the i ∼ N (0, σ 2 ) are independent Gaussians with
zero mean and variance σ 2 .

y i = fθ1 ,θ2 (xi1 , xi2 ) + i (2)

1. [4 Points] Show that the log likelihood of the data given the parameters is equal to ??.

n √
1 X i i i 2

l(D; θ1 , θ2 ) = − 2 (y − fθ1 ,θ2 (x1 , x2 )) − n log 2πσ (3)
2σ i=1

Recall the probability density function of the N (µ, σ 2 ) Gaussian distribution is given
by ??.

(x − µ)2

1
p(x) = √ exp − (4)
2πσ 2σ 2

9
2. [1 Point] If we disregard the parts that do not depend on fθ1 ,θ2 and Y the negative of
the log-likelihood given in ?? is equivalent to what commonly used loss function?

3. [2 Points] Many common techniques used to find the maximum likelihood estimates of
θ1 and θ2 rely on our ability to compute the gradient of the log-likelihood. Compute
the gradient of the log likelihood with respect to θ1 and θ2 . Express you answer in
terms of:
∂ ∂
yi, fθ1 ,θ2 (xi1 , xi2 ), fθ1 ,θ2 (xi1 , xi2 ), fθ1 ,θ2 (xi1 , xi2 )
∂θ1 ∂θ2

4. [2 Points] Given the learning rate η, what update rule would you use in gradient descent
to maximize the likelihood.

10
5. [3 Points] Suppose you are given some function h such that h(θ1 , θ2 ) ∈ R is large when
fθ1 ,θ2 is complicated and small when fθ1 ,θ2 is simple. Use the function h along with
the negative log-likelihood to write down an expression for the regularized loss with
parameter λ.

6. [2 Points] For small and large values of λ describe the bias variance trade off with
respect to the regularized loss provided in the previous part.

11
5 [12 points] Support Vector Machine
1. [2 points] Suppose we are using a linear SVM (i.e., no kernel), with some large C value,
and are given the following data set.

X2 3

1 2 3 4 5
X1

Draw the decision boundary of linear SVM. Give a brief explanation.

2. [3 points] In the following image, circle the points such that removing that example
from the training set and retraining SVM, we would get a different decision boundary
than training on the full sample.

3
X2

1 2 3 4 5
X1

You do not need to provide a formal proof, but give a one or two sentence explanation.

12
3. [3 points] Suppose instead of SVM, we use regularized logistic regression to learn the
classifier. That is,
(i)
kwk2 X 1 e(w·x +b)
(w, b) = arg min − 1[y (i)
= 0] ln + 1[y (i)
= 1] ln .
w∈R2 ,b∈R 2 i
1 + e(w·x(i) +b) 1 + e(w·x(i) +b)

In the following image, circle the points such that removing that example from the
training set and running regularized logistic regression, we would get a different decision
boundary than training with regularized logistic regression on the full sample.

3
X2

1 2 3 4 5
X1

You do not need to provide a formal proof, but give a one or two sentence explanation.

13
4. [4 points] Suppose we have a kernel K(·, ·), such that there is an implicit high-dimensional
feature map φP: Rd → RD that satisfies ∀x, z ∈ Rd , K(x, z) = φ(x) · φ(z), where
φ(x) · φ(z) = D i=1 φ(x)i φ(z)i is the dot product in the D-dimensional space.
Show how to calculate the Euclidean distance in the D-dimensional space
v
u D
uX
kφ(x) − φ(z)k = t (φ(x)i − φ(z)i )2
i=1

without explicitly calculating the values in the D-dimensional vectors. For this ques-
tion, you should provide a formal proof.

14
6 [30 points] Decision Tree and Ensemble Methods
An ensemble classifier HT (x) is a collection of T weak classifiers ht (x), each with some weight
αt , t = 1, . . . , T . Given a data point x ∈ Rd , HT (x) predicts its label based on the weighted
majority vote P of the ensemble. In the binary case where the class label is either 1 or -1,
HT (x) = sgn( Tt=1 αt ht (x)), where ht (x) : Rd → {−1, 1}, and sgn(z) = 1 if z > 0 and
sgn(z) = −1 if z ≤ 0. Boosting is an example of ensemble classifiers where the weights are
calculated based on the training error of the weak classifier on the weighted training set.

1. [10 points] For the following data set,

1.5

0.5

!0.5

!1.5

!2
!2 !1.5 !1 !0.5 0 0.5 1 1.5 2

• Describe a binary decision tree with the minimum depth and consistent with the
data;

• Describe an ensemble classifier H2 (x) with 2 weak classifiers that is consistent

with the data. The weak classifiers should be simple decision stumps. Specify the
weak classifiers and their weights.

15
2. [10 points] For the following XOR data set,

1.5

0.5

!0.5

!1.5

!2
!2 !1.5 !1 !0.5 0 0.5 1 1.5 2

• Describe a binary decision tree with the minimum depth and consistent with the
data;

• Let the ensemble classifier consist of the four binary classifiers shown below (the
arrow means that the corresponding classifier classifies every data point in that
direction as +), prove that there are no weights α1 , . . . , α4 , that make the ensemble
classifier consistent with the data.

1.5 h4

1
h1
0.5

!0.5
h2

!1.5 h3

!2
!2 !1.5 !1 !0.5 0 0.5 1 1.5 2

16
3. [10 points] Suppose that for each data point, the feature vector x ∈ {0, 1}m , i.e., x
y ∈ {−1, 1}, and the true classifier
consists of m binary valued features, the class labelP
is a majority vote over the features, i.e. y = sgn( m i=1 (2xi − 1)), where xi is the i
th

component of the feature vector.

• Describe a binary decision tree with the minimum depth and consistent with the
data. How many leaves does it have?

• Describe an ensemble classifier with the minimum number of weak classifiers.

Specify the weak classifiers and their weights.

Taeho Jo - Deep Learning Foundations-Springer (2023) (Z-Lib - Io)
No ratings yet
Taeho Jo - Deep Learning Foundations-Springer (2023) (Z-Lib - Io)
433 pages
Programming Large Language Models With Azure Open Ai: Conversational Programming and Prompt Engineering With Llms
No ratings yet
Programming Large Language Models With Azure Open Ai: Conversational Programming and Prompt Engineering With Llms
661 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Cognizant
No ratings yet
Cognizant
15 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Thesis Topic On Neural Network
100% (3)
Thesis Topic On Neural Network
4 pages
Artificial Intelligence Chapter 20.5: Neural Networks
No ratings yet
Artificial Intelligence Chapter 20.5: Neural Networks
84 pages
Resnet 50 1D CNN
No ratings yet
Resnet 50 1D CNN
24 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Transformer
No ratings yet
Transformer
31 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Syllabus For BE 5210 202510 Brain-Computer Interfaces
No ratings yet
Syllabus For BE 5210 202510 Brain-Computer Interfaces
4 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Final 2019
No ratings yet
Final 2019
15 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
HW 3
No ratings yet
HW 3
7 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
Exam 21
No ratings yet
Exam 21
17 pages
Huawei.H13-311 - V3.0.v2022-03-02.q107: Show Answer
No ratings yet
Huawei.H13-311 - V3.0.v2022-03-02.q107: Show Answer
24 pages
Ai Fundamentals Final Quiz Source by Ate Zein
No ratings yet
Ai Fundamentals Final Quiz Source by Ate Zein
25 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Finals 19
No ratings yet
Finals 19
16 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Finals 19
No ratings yet
Finals 19
16 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
Prompt Engr - Module 2
No ratings yet
Prompt Engr - Module 2
4 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
Age Estimation Using Ensemble of Deep Learning Models: Author Name
No ratings yet
Age Estimation Using Ensemble of Deep Learning Models: Author Name
46 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Direct Preference Optimization: Your Language Model Is Secretly A Reward Model
No ratings yet
Direct Preference Optimization: Your Language Model Is Secretly A Reward Model
27 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
No ratings yet
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
8 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Algorithms 20130703 PDF
No ratings yet
Algorithms 20130703 PDF
53 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Slots For Data Verification
No ratings yet
Slots For Data Verification
10 pages
19cse353 L23
No ratings yet
19cse353 L23
15 pages
From Technical Indicators To Trading Decisions Deep Learning Model Combining CNN and LSTM
No ratings yet
From Technical Indicators To Trading Decisions Deep Learning Model Combining CNN and LSTM
9 pages
Gas Recognition in E-Nose System A Review
No ratings yet
Gas Recognition in E-Nose System A Review
16 pages
Final f04
No ratings yet
Final f04
13 pages
Pattern Recognition: Talal A. Alsubaie Sfda
No ratings yet
Pattern Recognition: Talal A. Alsubaie Sfda
40 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
hw5 1
No ratings yet
hw5 1
6 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Cats and Dogs Classification
No ratings yet
Cats and Dogs Classification
12 pages
ML Finals16 PDF
No ratings yet
ML Finals16 PDF
12 pages
Midterm f01
No ratings yet
Midterm f01
10 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Explainable Time Series Prediction of f1 Tyres
No ratings yet
Explainable Time Series Prediction of f1 Tyres
9 pages
Machine Learning From Rohit
No ratings yet
Machine Learning From Rohit
14 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
Final: CS 189 Spring 2016 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2016 Introduction To Machine Learning
12 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Beginners Guide AI and ML
No ratings yet
Beginners Guide AI and ML
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
EE 769 2023.02.23 Mid Term
No ratings yet
EE 769 2023.02.23 Mid Term
2 pages
Automated Tobacco Grading Using Image Processing Techniques and A Convolutional Neural Network
No ratings yet
Automated Tobacco Grading Using Image Processing Techniques and A Convolutional Neural Network
7 pages
Mksaad OSAC OpenSourceArabicCorpora EECS10 Rev9
No ratings yet
Mksaad OSAC OpenSourceArabicCorpora EECS10 Rev9
7 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Semi-Supervised Learning A Brief Review
No ratings yet
Semi-Supervised Learning A Brief Review
6 pages
AIML - CAT-II - Question Bank
No ratings yet
AIML - CAT-II - Question Bank
2 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
PCAC2008
No ratings yet
PCAC2008
2 pages
CPIT 110 - Course Schedule 2021-Term1 - v3
No ratings yet
CPIT 110 - Course Schedule 2021-Term1 - v3
4 pages
Computer Skills - CPIT 100
No ratings yet
Computer Skills - CPIT 100
4 pages
Machine Learning Engineer
No ratings yet
Machine Learning Engineer
2 pages
Test bank حاسب
No ratings yet
Test bank حاسب
2 pages
IT182 CSC 186 Groupings Report Assignment
No ratings yet
IT182 CSC 186 Groupings Report Assignment
1 page

07au Midterm

Uploaded by

07au Midterm

Uploaded by

10-701 Midterm Exam, Fall 2007

7. You have 80 minutes.

Question Topic Max. score Score

5. [4 points] Given n linearly independent feature vectors in n dimensions, show that

Figure 1: Beta(4, 2) distribution

(a) (b) (c)

3. [4 points] Let F (x) = w0 + dj=1 wj xj and L(yF (x)) = 1+exp(yF

D = {((x11 , x12 ), y 1 ), ((x21 , x22 ), y 2 ), . . . , ((xn1 , xn2 ), y n )} (1)

y i = fθ1 ,θ2 (xi1 , xi2 ) + i (2)

Draw the decision boundary of linear SVM. Give a brief explanation.

1. [10 points] For the following data set,

• Describe an ensemble classifier H2 (x) with 2 weak classifiers that is consistent

component of the feature vector.

• Describe an ensemble classifier with the minimum number of weak classifiers.

You might also like

y i = fθ1 ,θ2 (xi1 , xi2 ) + i (2)