0% found this document useful (0 votes)

133 views10 pages

Final F04soln

The document contains the final exam for the course 6.867 Machine Learning. It consists of 5 problems related to machine learning algorithms including AdaBoost, SVMs, clustering, mixture models, and HMMs. The student has provided their name and ID for the exam and rates their performance as an A with the justification that "why not?".

Uploaded by

acrosstheland8535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views10 pages

Final F04soln

Uploaded by

acrosstheland8535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

6.

867 Machine learning

Final exam December 3, 2004

Your name and MIT ID: J. D. 00000000

(Optional) The grade you would give to yourself + a brief justication.

A... why not?

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Problem 1

(1)
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4

(2) +

Figure 1: Labeled training points for problem 1. Consider the labeled training points in Figure 1, where x and o denote positive and negative labels, respectively. We wish to apply AdaBoost with decision stumps to solve the classication problem. In each boosting iteration, we select the stump that minimizes the weighted training error, breaking ties arbitrarily. 1. (3 points) In gure 1, draw the decision boundary corresponding to the rst decision stump that the boosting algorithm would choose. Label this boundary (1), and also indicate +/- side of the decision boundary. 2. (2 points) In the same gure 1 also circle the point(s) that have the highest weight after the rst boosting iteration. 3. (2 points) What is the weighted error of the rst decision stump after
the rst boosting iteration, i.e., after the points have been reweighted?
0.5

4. (3 points) Draw the decision boundary corresponding to the second decision stump, again in Figure 1, and label it with (2), also indicating the +/- side of the boundary. 5. (3 points) Would some of the points be misclassied by the combined classier after the two boosting iterations? Provide a brief justication. (the points will be awarded for the justication, not whether your y/n answer is correct) Yes. For example, the circled point in the gure is misclassied by the rst decision stump and could be classied correctly in the combination only if the weight/votes of the second stump is higher than the rst. If it were higher, however, then the points misclassied by the second stump would be misclassied in the combination.

Problem 2
1. (2 points) Consider a linear SVM trained with n labeled points in R2 without slack penalties and resulting in k = 2 support vectors (k < n). By adding one additional labeled training point and retraining the SVM classier, what is the maximum number of support vectors in the resulting solution? ( ( ( ) k
) k + 1
) k + 2

(X) n+1 2. We train two SVM classiers to separate points in R2 . The classiers dier only in terms of the kernel function. Classier 1 uses the linear kernel K1 (x, x ) = xT x , and classier 2 uses K2 (x, x ) = p(x)p(x ), where p(x) is a 3-component Gaussian mixture density, estimated on the basis of related other problems. (a) (3 points) What is the VC-dimension of the second SVM classier that uses kernel K2 (x, x )? The feature space is 1-dimensional; each point x R2 is mapped to a non-negative number p(x). (b) (T/F 2 points) The second SVM classier can only separate points that are likely according to p(x) from those that have low probability under p(x). T 2

(c) (4 points) If both SVM classiers achieve zero training error on n labeled points, which classier would have a better generalization guarantee? Provide a brief justication. The rst classier has VC-dimension 3 while the second one has VC-dimension 2. The complexity penalty for the rst one is therefore higher. When the number of training errors is the same for the two classiers, the bound on the expected error is smaller for the second classier.

Problem 3

6
8

4
6

2 x2

2
2

4
4

8 x1

6 2

4 x1

(a)

(b)

Figure 2: Data sets for clustering. Points are located at integer coordinates. 1. (4 points) First consider the data plotted in Figure 2a, which consist of two rows of equally spaced points. If k -means clustering (k = 2) is initialised with the two points whose coordinates are (9, 3) and (11, 3), indicate the nal clusters obtained (after the algorithm converges) on Figure 2a. 2. (4 points) Now consider the data in Figure 2b. We will use spectral clustering to divide these points into two clusters. Our version of spectral clustering uses a neighbourhood graph obtained by connecting each point to its two nearest neighbors (breaking ties randomly), and by weighting the resulting edges between points xi and xj by Wij = exp(||xi xj ||). Indicate on Figure 2b the clusters that we will obtain from spectral clustering. Provide a brief justication. The random walk induced by the weights can switch between the clusters in the gure in only two places, (0,-1) and (2,0). Since the weights decay with distance, the weights corresponding to transitions within clusters are higher than those going across in both places. The random would would therefore tend to remain within the clusters indicated in the gure. 3. (4 points) Can the solution obtained in the previous part for the data in Figure 2b also be obtained by k -means clustering (k = 2)? Justify your answer. No. In the k-means algorithm points are assigned to the closest mean (cluster cen troid). The centroids of the left and right clusters in the gure are (0,0) and (5,0), respectively. Point (2,0), for example, is closer to the left cluster centroid (0,0) and wouldnt be assigned to the right cluster. The two clusters in the gure therefore cannot be xed points of the k-means algorithm. 4

1.2 1 0.8 0.6 y 0.4 0.2 0 0.2 0

0.2

0.4 x

0.6

0.8

Figure 3: Training sample from a mixture of two linear models

Problem 4
The data in Figure 3 comes from a mixture of two linear regression models with Gaussian noise:
2 2 P (y |x; ) = p1 N (y ; w10 + w11 x, 1 ) + p2 N (y ; w20 + w21 x, 2 )

where p1 + p2 = 1 and = (p1 , p2 , w10 , w11 , w20 , w21 , 1 , 2 ). We hope to estimate from such data via the EM algorithm.
To this end, let z {1, 2} be the mixture index, variable indicating which of the regression
models is used to generate y given x.
1. (6 points) Connect the random variables X , Y , and Z with directed edges so that the graphical model on the left represents the mixture of linear regression models described above, and the one on the right represents a mixture-of-experts model. For both models, Y denotes the output variable, X the input, and Z is the choice of the linear regression model or expert.

mixture of linear regressions X Y Z

mixture of experts X Y Z

We use a single plot to represent the model parameters (see the gure below). Each linear regression model appears as a solid line (y = wi0 + wi1 x) in between two parallel dotted lines at vertical distance 2i to the solid line. Thus each regression model covers the data that falls between the dotted lines. When w10 = w20 and w11 = w21 you would only see a single solid line in the gure; you may still see two dierent sets of dotted lines corresponding to dierent values of 1 and 2 . The solid bar to the right represents p1 (and p2 = 1 p1 ). For example, if = ( =( the plot is
1.2 1 0.8 0.8 0.6 y 0.4 0.2 0 0.1 0.2 0 0.2 0.4 x 0.6 0.8 1 0 p1 0.7 0.6 0.5 0.4 0.3 0.2 1 0.9

p1 , p2 , w10 , w11 , 0.35, 0.65, 0.5, 0,

w20 , w21 , 1 , 0.85, 0.7, 0.05,

2 ) 0.15)

2. (6 points) We are now ready to estimate the parameters via EM. There are, however, many ways to initialize the parameters for the algorithm. On the next page you are asked to connect 3 dierent initializations (left column) with the parameters that would result after one EM iteration (right column). Dierent initializations may lead to the same set of parameters. Your answer should consist of 3 arrows, one from each initialization.

Initialization

Next iteration
1.2 1 0.8 0.8 0.7 0.6 0.5 0.4 0.8 0.2 0 0.1 0.2 0 0.2 0.4 x 0.6 0.8 1 0 p 0.4 0.3 0.2 1 0.9

1.2 1 0.8 0.6

1 0.9
y

0.6

0.7 0.6 0.5

0.4 0.2 0

0.4 0.3 0.2 0.1

0.2

0.4 x

0.6

0.8

1.2 p1 1

1 0.9 0.8

0.8 0.6 1.2 1 0.8 0.8 0.6

0.7 0.6 0.5

1 0.9

0.4 0.2 0

0.4 0.3 0.2

0.7 0.6

0.1 0.2 0.5 0.4 0.2 0 0.1 0.9 0.2 0 0.2 0.4 x 0.6 0.8 1 0 p1 1 0.8 0.8 0.6
y

0.2

0.4 x

0.6

0.8

0.4 0.3 0.2 1.2 1

0.7 0.6 0.5

0.4 1.2 1 0.8 0.8 0.6

0.4 0.3 0.2

1 0.9

0.2 0

0.1 0.2 0 0.2 0.4 x 0.6 0.8 1 0 p1

0.7 0.6 0.5

0.4 0.2 0

0.4 0.3 0.2 1 0.1 0.8 0.8 p1 0.6

1.2

1 0.9

0.2

0.4 x

0.6

0.8

0.7 0.6 0.5

0.4 0.2 0

0.4 0.3 0.2 0.1

0.2

0.4 x

0.6

0.8

Problem 5

Assume that the following sequences are very long and the pattern highlighted with spaces is repeated: Sequence 1: Sequence 2: 1 0 0 1 1 0 0 1 0 0 1 0 0 ... ... 1 0 0

1 0 0

1. (4 points) If we model each sequence with a dierent rst-order HMM, what is the number of hidden states that a reasonable model selection method would report? HMM for Sequence 1 HMM for Sequence 2 No. of hidden states 3 4

2. (2 points) The following Bayesian network depicts a sequence of 5 observations from an HMM, where s1 , s2 , s3 , s4 , s5 is the hidden state sequence.

s1 x1

s2 x2

s3 x3

s4 x4

s5 x5

Are x1 and x5 independent given x3 ? Briey justify your answer. They are not independent. The moralized ancestral graph corresponding to x1 , x3 , and x5 is the same graph with arrows replaced with undirected edges. x1 and x5 are not separated given x3 , and thus not independent. 3. (3 points) Does the order of Markov dependencies in the observed sequence always determine the number of hidden states of the HMM that generated the sequence? Provide a brief justication. No. The answer to the previous question implies that observations corresponding to (typical) HMMs have no Markov properties (of any order). This holds, for exam ple, when there are only two possible hidden states. Thus Markov properties of the observation sequence cannot in general determine the number of hidden states.

Problem 6
We wish to develop a graphical model for the following transportation problem. A transport company is trying to choose between two alternative routes for commuting between Boston and New York. In an experiment, two identical busses leave Boston at the same but otherwise random time, TB . The busses take dierent routes, arriving at their (common) destination at times TN 1 and TN 2 . Transit time for each route depends on the congestion along the route, and the two con gestions are unrelated. Let us represent the random delays introduced along the routes by variables C 1 and C 2. Finally, let F represent the identity of the bus which reaches New York rst. We view F as a random variable that takes values 1 or 2. 1. (6 points) Complete the following directed graph (Bayesian network) with edges so that it captures the relationships between the variables in this transportation problem.
TB

TN1

TN2

2. (3 points) Consider the following directed graph as a possible representation of the independences between the variables TN 1 , TN 2 , and F only:

Which of the following factorizations of the joint are consistent with the graph?
P (TN 1 )P (TN 2 )P (F |TN 1 , TN 2 ) X

P (TN 1 )P (TN 2 )P (F |TN 1 )

P (TN 1 )P (TN 2 )P (F )

12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
Blue Eyes Technology
100% (1)
Blue Eyes Technology
23 pages
Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Ignition UserManual
No ratings yet
Ignition UserManual
850 pages
Final F01soln
No ratings yet
Final F01soln
13 pages
10 SVM
No ratings yet
10 SVM
77 pages
Final f03
No ratings yet
Final f03
8 pages
Final f04
No ratings yet
Final f04
13 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
PC Analyzer Debug Card Manual
100% (2)
PC Analyzer Debug Card Manual
35 pages
Midterm f01
No ratings yet
Midterm f01
10 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Solution In5520 Exercise SVM 2020
No ratings yet
Solution In5520 Exercise SVM 2020
8 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
GX Works2 Beginner's Manual (Simple Project) - Sh080787engn
No ratings yet
GX Works2 Beginner's Manual (Simple Project) - Sh080787engn
106 pages
10f 601 Midterm
No ratings yet
10f 601 Midterm
17 pages
Sample Midterm Exam 6
No ratings yet
Sample Midterm Exam 6
11 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Final F03soln
No ratings yet
Final F03soln
10 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Lecture Slides Week11
No ratings yet
Lecture Slides Week11
33 pages
Yeti Eng Um
No ratings yet
Yeti Eng Um
121 pages
Lecture Slides-Week11
No ratings yet
Lecture Slides-Week11
32 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
PSPP PDF
100% (1)
PSPP PDF
193 pages
PSPP PDF
100% (1)
PSPP PDF
193 pages
Final 2014 Wwithanswers
No ratings yet
Final 2014 Wwithanswers
8 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
HW 3
No ratings yet
HW 3
7 pages
Final f02
No ratings yet
Final f02
12 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
C Language Quiz Questions and Answers With Explanation
No ratings yet
C Language Quiz Questions and Answers With Explanation
47 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
EE 769 2023.02.23 Mid Term
No ratings yet
EE 769 2023.02.23 Mid Term
2 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
SKEY1
No ratings yet
SKEY1
26 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
07au Midterm
No ratings yet
07au Midterm
17 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Use The Transaction Data To Produce Information Needed by Managers To Run The Business
No ratings yet
Use The Transaction Data To Produce Information Needed by Managers To Run The Business
19 pages
Lec3 PDF
No ratings yet
Lec3 PDF
6 pages
Chapter01 Oracle Architecture
No ratings yet
Chapter01 Oracle Architecture
42 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
How To Build Your Own Mikrotik Wifi
No ratings yet
How To Build Your Own Mikrotik Wifi
14 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
Testing Resume..
100% (1)
Testing Resume..
3 pages
Nodia and Company: Gate Solved Paper Computer Science Engineering 2007
No ratings yet
Nodia and Company: Gate Solved Paper Computer Science Engineering 2007
18 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Final Review
No ratings yet
Final Review
118 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Online Quiz Examination System: Bachelor of Technology
No ratings yet
Online Quiz Examination System: Bachelor of Technology
21 pages
HW 3
No ratings yet
HW 3
5 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Install Oracle 12C Release 1
No ratings yet
Install Oracle 12C Release 1
12 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Cryptography - Assignment I
No ratings yet
Cryptography - Assignment I
2 pages
Swift Vs Scala 2.11
100% (3)
Swift Vs Scala 2.11
15 pages
ICT & Telecom Predictions by TMT
No ratings yet
ICT & Telecom Predictions by TMT
68 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Raw Internet Advertising
No ratings yet
Raw Internet Advertising
44 pages
Rnews 2001-3
No ratings yet
Rnews 2001-3
40 pages
DCCN Important Questions From All Units Test-1 - 1
No ratings yet
DCCN Important Questions From All Units Test-1 - 1
4 pages
HW 1
No ratings yet
HW 1
4 pages
Practical No 5
No ratings yet
Practical No 5
9 pages
A Strategic Advantage With Behavioral Targeting? How Can (And What) Firms Benefit From Personal Data-Based Online Marketing Strategies
No ratings yet
A Strategic Advantage With Behavioral Targeting? How Can (And What) Firms Benefit From Personal Data-Based Online Marketing Strategies
34 pages
DBMS QP
No ratings yet
DBMS QP
4 pages
Course: DD2427 - Exercise Class 1: Exercise 1 Motivation For The Linear Neuron
No ratings yet
Course: DD2427 - Exercise Class 1: Exercise 1 Motivation For The Linear Neuron
5 pages
Webservices Questions What Is Web Services?
No ratings yet
Webservices Questions What Is Web Services?
6 pages
256x16 SRAM Design: CPEN115 - Computer System Architecture
100% (1)
256x16 SRAM Design: CPEN115 - Computer System Architecture
7 pages
IBM Views and Viewpoints
No ratings yet
IBM Views and Viewpoints
24 pages
Torfs+Brauer Short R Intro
No ratings yet
Torfs+Brauer Short R Intro
12 pages
Adsense Criteo Direct Implementation Guide v1.0 PDF
No ratings yet
Adsense Criteo Direct Implementation Guide v1.0 PDF
10 pages
Evaluating The Best Alternative For Software Re-Engineering Using Hybrid Code Metric
No ratings yet
Evaluating The Best Alternative For Software Re-Engineering Using Hybrid Code Metric
5 pages
MATLAB Crash Course
No ratings yet
MATLAB Crash Course
11 pages
Appendix Timeseries Regression
No ratings yet
Appendix Timeseries Regression
8 pages
Rattle - A Data Mining GUI For R
No ratings yet
Rattle - A Data Mining GUI For R
11 pages
Deloitte Us Cons Techtrends13
No ratings yet
Deloitte Us Cons Techtrends13
94 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Multiprocessing and Multithreading
No ratings yet
Multiprocessing and Multithreading
4 pages
Notes Session Class
No ratings yet
Notes Session Class
3 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
3 pages
Cross Build
No ratings yet
Cross Build
3 pages
I/O Systems: Bibliographical Notes
No ratings yet
I/O Systems: Bibliographical Notes
2 pages
BEETLE /M-II Plus: POS System For The Highest Standards of Performance
No ratings yet
BEETLE /M-II Plus: POS System For The Highest Standards of Performance
2 pages
ARC1882 Specification 201306
No ratings yet
ARC1882 Specification 201306
2 pages

Final F04soln

Uploaded by

Final F04soln

Uploaded by

6.

867 Machine learning

Your name and MIT ID: J. D. 00000000

(Optional) The grade you would give to yourself + a brief justication.

1.2 1 0.8 0.6 y 0.4 0.2 0 0.2 0

Figure 3: Training sample from a mixture of two linear models

mixture of linear regressions X Y Z

p1 , p2 , w10 , w11 , 0.35, 0.65, 0.5, 0,

w20 , w21 , 1 , 0.85, 0.7, 0.05,

1.2 1 0.8 0.6

0.7 0.6 0.5

0.4 0.3 0.2 0.1

0.8 0.6 1.2 1 0.8 0.8 0.6

0.7 0.6 0.5

0.4 0.3 0.2

0.4 0.3 0.2 1.2 1

0.7 0.6 0.5

0.4 1.2 1 0.8 0.8 0.6

0.4 0.3 0.2

0.1 0.2 0 0.2 0.4 x 0.6 0.8 1 0 p1

0.7 0.6 0.5

0.4 0.3 0.2 1 0.1 0.8 0.8 p1 0.6

0.7 0.6 0.5

0.4 0.3 0.2 0.1

P (TN 1 )P (TN 2 )P (F |TN 1 )

You might also like