CSCI 5521 Spring 2025 Final Exam
CSCI 5521 Spring 2025 Final Exam
Final Exam
Instructions:
• The final exam has 5+1 questions, 100+2 points, on 8 pages, including one extra credit problem
worth 2 points.
• Please write your name & ID on this cover page.
• For full credit, show how you arrive at your answers.
1. (30 points) In I-III, select the correct option(s) (it is not necessary to explain).
II. Select all the option(s) that are true about model selection:
(a) Nonparametric methods do not assume distributions of data to start with.
(b) Multilayer Perceptron always offers superiority over Perceptron.
(c) Decision trees are selected over random forest in cases where interpretability is prioritized.
(d) Neural networks should never be considered for real-time applications.
(e) Model selection should consider both data and tasks.
1
Instructor: Catherine Qi Zhao. TA: Hanchen Cui, Hunmin Lee, Haoyi Shi, James Yang. Email:
[email protected]
Page 1
Name+Email ID
2. (12 points) Given the decision tree in the figure below, the node 1 was split using feature C. Now
suppose we wish to split node 3. What is the feature that you will be using to split? Show your
work.
A B C Class
1 1 1 1
1 0 0 0
0 0 1 1
1
1 0 0 0
C=0 C=1 0 1 0 0
1 1 1 1
2 3 0 1 0 0
1 1 0 0
1 0 1 1
0 1 1 0
Page 2
Name+Email ID
3. (15 points) Suppose we are training a linear SVM on a tiny dataset of 12 points shown in the
figure below. Samples with positive labels are (1.5, 3.5), (1, 3), (1,4), (2,3), (2,2), (0,4) (denoted
as red dots) and samples with negative labels are (-1.5, 0.5), (-1, 1), (-1, 0), (-2,1), (-2,2), (0,0)
(denoted as blue triangles).
y
positive samples
negative samples
4
0 x
2 1 0 1 2
(c) Pick one positive and one negative sample, and calculate their distances to the hyperplane.
(d) If a new sample (-0.5, 0.5) comes as a negative sample on top of the original 12 points, select
all the option(s) that are true:
i. Kernel SVM would be a good option with the new data.
ii. The decision boundary would change.
Page 3
Name+Email ID
a=0 a=1
0.7 0.3
A
b=0 b=1 c=0 c=1
a=0 0.9 0.1 a=0 0.5 0.5
a=1 0.6 0.4 a=1 0.4 0.6
B C
d=0 d=1 f=0 f=1
b=0 0.4 0.6 c=0 0.1 0.9
b=1 0.9 0.1 D E F c=1 0.2 0.8
Note: The numerical values of the probabilities are for part (e). You do not need to use them for
(a)-(d).
(a) Find the joint probability P (A, B, C, D, E, F, G, H) as the product of conditional probabili-
ties, according to the graphical model given above.
Page 4
Name+Email ID
(e) Using the conditional probability distribution (CPD) tables in the figure, find:
i. P (a = 1|c = 1)
ii. P (a = 1, g = 0)
iii. P (a = 1, b = 0, c = 1, d = 1, e = 0, f = 0, g = 0, h = 1)
iv. P (b = 0, c = 1, d = 1, e = 0, f = 0, h = 1|a = 1, g = 0)
Page 5
Name+Email ID
5. (18 points) Consider a Multi-layer Perceptron (MLP) for the following two general tasks: (1)
multi-class classification of K=3 categories with 3 output units; and (2) regression with a single
output unit,
Pwhere each hidden unit in both tasks uses a hyperbolic tangent function such that
zht = tanh( D w
j=1 hj jx t
+ w h0 ). The output unit in classification uses a softmax activation function
t +v )
P
exp( h vih zh
such that yit = P
exp(
P
v jh zh
i0
t +v ) .
j0
The error functions for tasks (1) and (2) are given below
j h
respectively:
N P
K H K
• Multi-class classification: E(W, V |X) = − rit log yit + λ
||wh ||22 + σ
||vk ||22
P P P
2 2
t=1 i=1 h=1 k=1
H
• Regression: E(W, v|X) = 1
PN λ
t
− y t )2 + ||wh ||22 + σ2 ||v||22 .
P
2 t=1 (r 2
h=1
(a) Draw two Multi-layer Perceptrons, each for one of the above tasks, showing: input values
x0 ...xD , output of the hidden units z0 ...zH , weights W and V (or v), and the output(s) (i.e.,
yi of output unit i for multi-class classification, and y for regression). Note the difference in
the structure between the two tasks (you may write or draw).
Page 6
Name+Email ID
(b) Derive the Forward Step equations for y for both tasks.
Hint: Think about whether or not you should apply an activation function at output unit.
(c) Pick one of the two tasks, derive the Backward Step equation for whj and vh .
Hint:
′
• tanh (x) = 1 − tanh2 (x)
• Given the softmax function f (αi ) = Pexp(α i)
, then ∂f∂α(αji ) = f (αi )(δij − f (αj )), in which
j exp(αj )
δij is an indicator function, such that δij = 1 if i = j, and 0 otherwise.
Page 7
Name+Email ID
6. (2 points, extra credit) Edgar has joined a startup that uses cutting-edge AI to help brands
generate and optimize video advertisements for TVs as well as platforms such as Instagram and
YouTube. The company’s system creates ad components such as visuals, music, narration, and
slogans based on the client’s messaging and campaign goals (e.g., launching a product, increasing
user engagement).
(a) Name two different types of machine learning models that could be used at this company. For
each model, briefly describe what kind of input it takes and what kind of output it produces.
(b) After an ad is released to the public, the company collects viewer interaction data (with
user permission) such as watch time, skips, or shares to decide how to refine this ad. What
machine learning framework is suitable for learning and improving from this type of feedback?
Name one and briefly explain why it is helpful.
Page 8