0% found this document useful (0 votes)
13 views8 pages

CSCI 5521 Spring 2025 Final Exam

The document is the final exam for CSCI 5521: Introduction to Machine Learning, due on May 14th, 2025. It consists of 5+1 questions covering various topics in machine learning, including loss functions, decision trees, SVMs, Bayesian networks, and multi-layer perceptrons, with a total of 100+2 points available. Students are instructed to show their work for full credit and to submit their exam on Gradescope.

Uploaded by

nguy4176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

CSCI 5521 Spring 2025 Final Exam

The document is the final exam for CSCI 5521: Introduction to Machine Learning, due on May 14th, 2025. It consists of 5+1 questions covering various topics in machine learning, including loss functions, decision trees, SVMs, Bayesian networks, and multi-layer perceptrons, with a total of 100+2 points available. Students are instructed to show their work for full credit and to submit their exam on Gradescope.

Uploaded by

nguy4176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Name+Email ID

CSCI 5521: Introduction to Machine Learning


(Spring 2025) 1

Final Exam

Due on Gradescope by 9am, May 14th

Instructions:
• The final exam has 5+1 questions, 100+2 points, on 8 pages, including one extra credit problem
worth 2 points.
• Please write your name & ID on this cover page.
• For full credit, show how you arrive at your answers.

1. (30 points) In I-III, select the correct option(s) (it is not necessary to explain).

(I) (II) (III)

I. Select all the option(s) that are loss functions:


(a) Mean squared error (b) Cross entropy (c) Entropy (d) 0-1 loss (e) Hinge loss

II. Select all the option(s) that are true about model selection:
(a) Nonparametric methods do not assume distributions of data to start with.
(b) Multilayer Perceptron always offers superiority over Perceptron.
(c) Decision trees are selected over random forest in cases where interpretability is prioritized.
(d) Neural networks should never be considered for real-time applications.
(e) Model selection should consider both data and tasks.

III. Select all the option(s) that are true:


(a) Activation functions can only be used in hidden layers.
(b) A bigger number of hidden nodes in a Multilayer Perceptron helps reduce overfitting with the
same amount of original data.
(c) Data augmentation (e.g., creating copies of image data that are rotated or scaled versions of
the original ones) helps reduce overfitting with the same amount of original data.
(d) Kernel methods are designed to reduce overfitting.
(e) Graphical models and neural networks both explicitly encode relationships and uncertainty.

1
Instructor: Catherine Qi Zhao. TA: Hanchen Cui, Hunmin Lee, Haoyi Shi, James Yang. Email:
[email protected]

Page 1
Name+Email ID

2. (12 points) Given the decision tree in the figure below, the node 1 was split using feature C. Now
suppose we wish to split node 3. What is the feature that you will be using to split? Show your
work.

A B C Class
1 1 1 1
1 0 0 0
0 0 1 1
1
1 0 0 0
C=0 C=1 0 1 0 0
1 1 1 1
2 3 0 1 0 0
1 1 0 0
1 0 1 1
0 1 1 0

Page 2
Name+Email ID

3. (15 points) Suppose we are training a linear SVM on a tiny dataset of 12 points shown in the
figure below. Samples with positive labels are (1.5, 3.5), (1, 3), (1,4), (2,3), (2,2), (0,4) (denoted
as red dots) and samples with negative labels are (-1.5, 0.5), (-1, 1), (-1, 0), (-2,1), (-2,2), (0,0)
(denoted as blue triangles).

y
positive samples
negative samples
4

0 x
2 1 0 1 2

(a) Draw the maximum-margin hyperplane.

(b) Circle the support vectors.

(c) Pick one positive and one negative sample, and calculate their distances to the hyperplane.

(d) If a new sample (-0.5, 0.5) comes as a negative sample on top of the original 12 points, select
all the option(s) that are true:
i. Kernel SVM would be a good option with the new data.
ii. The decision boundary would change.

Page 3
Name+Email ID

4. (25 points) Consider the Bayesian Network below:

a=0 a=1
0.7 0.3
A
b=0 b=1 c=0 c=1
a=0 0.9 0.1 a=0 0.5 0.5
a=1 0.6 0.4 a=1 0.4 0.6
B C
d=0 d=1 f=0 f=1
b=0 0.4 0.6 c=0 0.1 0.9
b=1 0.9 0.1 D E F c=1 0.2 0.8

g=0 g=1 e=0 e=1


d=0 0.5 0.5 G b=0, c=0 0.8 0.2
d=1 0.8 0.2 b=0, c=1 0.3 0.7
b=1, c=0 0.7 0.3
h=0 h=1 b=1, c=1 0.6 0.4
H
g=0 0.4 0.6
g=1 0.6 0.4

Note: The numerical values of the probabilities are for part (e). You do not need to use them for
(a)-(d).

(a) Find the joint probability P (A, B, C, D, E, F, G, H) as the product of conditional probabili-
ties, according to the graphical model given above.

(b) List all conditional independence given the graph.

(c) Show how to find the conditional probability P (A|C).

Page 4
Name+Email ID

(d) Show how to find the marginal probability P (A, G).

(e) Using the conditional probability distribution (CPD) tables in the figure, find:
i. P (a = 1|c = 1)

ii. P (a = 1, g = 0)

iii. P (a = 1, b = 0, c = 1, d = 1, e = 0, f = 0, g = 0, h = 1)

iv. P (b = 0, c = 1, d = 1, e = 0, f = 0, h = 1|a = 1, g = 0)

Page 5
Name+Email ID

5. (18 points) Consider a Multi-layer Perceptron (MLP) for the following two general tasks: (1)
multi-class classification of K=3 categories with 3 output units; and (2) regression with a single
output unit,
Pwhere each hidden unit in both tasks uses a hyperbolic tangent function such that
zht = tanh( D w
j=1 hj jx t
+ w h0 ). The output unit in classification uses a softmax activation function
t +v )
P
exp( h vih zh
such that yit = P
exp(
P
v jh zh
i0
t +v ) .
j0
The error functions for tasks (1) and (2) are given below
j h
respectively:
N P
K H K
• Multi-class classification: E(W, V |X) = − rit log yit + λ
||wh ||22 + σ
||vk ||22
P P P
2 2
t=1 i=1 h=1 k=1

H
• Regression: E(W, v|X) = 1
PN λ
t
− y t )2 + ||wh ||22 + σ2 ||v||22 .
P
2 t=1 (r 2
h=1

(a) Draw two Multi-layer Perceptrons, each for one of the above tasks, showing: input values
x0 ...xD , output of the hidden units z0 ...zH , weights W and V (or v), and the output(s) (i.e.,
yi of output unit i for multi-class classification, and y for regression). Note the difference in
the structure between the two tasks (you may write or draw).

Page 6
Name+Email ID

(b) Derive the Forward Step equations for y for both tasks.
Hint: Think about whether or not you should apply an activation function at output unit.

(c) Pick one of the two tasks, derive the Backward Step equation for whj and vh .
Hint:

• tanh (x) = 1 − tanh2 (x)
• Given the softmax function f (αi ) = Pexp(α i)
, then ∂f∂α(αji ) = f (αi )(δij − f (αj )), in which
j exp(αj )
δij is an indicator function, such that δij = 1 if i = j, and 0 otherwise.

Page 7
Name+Email ID

6. (2 points, extra credit) Edgar has joined a startup that uses cutting-edge AI to help brands
generate and optimize video advertisements for TVs as well as platforms such as Instagram and
YouTube. The company’s system creates ad components such as visuals, music, narration, and
slogans based on the client’s messaging and campaign goals (e.g., launching a product, increasing
user engagement).

(a) Name two different types of machine learning models that could be used at this company. For
each model, briefly describe what kind of input it takes and what kind of output it produces.

(b) After an ad is released to the public, the company collects viewer interaction data (with
user permission) such as watch time, skips, or shares to decide how to refine this ad. What
machine learning framework is suitable for learning and improving from this type of feedback?
Name one and briefly explain why it is helpful.

Page 8

You might also like