Dda3020 22

The document contains a final exam for a Machine Learning course, consisting of multiple-choice questions and calculations related to various topics in the field. It covers areas such as artificial intelligence subfields, logistic regression, SVM, CNNs, and decision trees, along with their respective properties and applications. Additionally, it includes derivations and calculations related to probability distributions, model training, and evaluation metrics.

Uploaded by

lambertlin0511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Dda3020 22

Uploaded by

lambertlin0511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

DDA2020 Machine Learning

Final Exam

1 Multiple-choice questions (2 points per ques- D. Both Binomial and Gaussian distributions are
tion) (30 points) continuous distributions.
Note that there might be one or more correct op-
tion(s). If there are more than one correct answer, you 1.5 Which of the following statement(s) is/are correct?
should select all the correct options in order to get full
marks. If your answer is correct but incomplete, you A. When the number of training set n is very large
will only get partial marks. If any incorrect option is and the feature dimension d is very small, we
chosen, you will get zero mark. prefer to use the closed form solution rather than
gradient descent method.
1.1 Which one(s) are the sub-area of artificial intelli-
gence? B. Linear regression can only be used for regression
tasks.
A. Computer Vision
C. The decision boundary of Ridge regression is
B. Robotics non-linear in the original feature space.
C. Natural Language Processing D. The decision boundary of polynomial linear
D. Machine Learning regression is non-linear in the original feature
space.
E. Optimization
E. If standard linear regression is overfitting, we can
1.2 Suppose in a 2-class classification problem, our try Ridge regression.
standard logistic regression model achieves 99% accu- F. If standard linear regression is overfitting, we can
racy on the training set, but only 51% accuracy on the try polynomial linear regression.
test set. Which of the following modifications might
potentially improve our algorithm’s test accuracy?
1.6 Which of the following statement(s) is/are correct?
A. Use regularized logistic regression.
A. Standard SVM cannot handle a non-linearly
B. Use polynomial hypothesis function to replace the separated data.
original hypothesis function w> x + b.
B. SVM with slack variables can handle a non-
C. Add more training data. linearly separated data and its training error
D. Add more testing data. could be 0.
C. Kernel SVM can perfectly fit non-linear separated
1.3 Which of the following statement(s) is/are correct? data.
A. Backpropagation consists of a forward pass that D. Margin is the distance from the closest point
computes the error, and a backward pass that of positive and negative classes to the decision
adjusts the weights. boundary
B. In a feedforward neural network, the information
always moves in one direction. 1.7 Which of the following statement(s) is/are correct?
C. A convolutional neural network is a fully con- A. PCA can give non-linear projection of data.
nected network.
B. Dimensionality reduction tasks can be solved by
D. When training a multi-layer perception, we com- supervised or unsupervised learning methods.
pute the error using a loss function at the output
layer and pass the gradients from the output C. We project N dimensional data points to a k di-
layer backwards to the input layer to update the mensional space, the dimension of reconstruction
weights. points is lower than that of the original data.
D. In PCA, we choose the k eigenvectors with the
1.4 Which of the following statement(s) is/are correct? top k largest eigenvalues to form a new matrix.
A. Toss a coin 4 times, then there will be 4 possible
1.8 Which of the following statement(s) is/are correct?
events.
B. If the probability of Event A is 0.1 and the A. Decision tree can only handle training data with
probability of Event B is 0.01, then information numerical attributes
of A is smaller then B. B. When we set the minimal size of leaf node to be a
C. KL divergence is positive and non-symmetric. large value, then we will grow a deep decision tree.
C. When we set the maximal depth to be a small D. When we increase the model complexity, bias
value, then we will grow a shallow decision tree. grows and variance drops in the testing set.
D. When we increase the number of decision trees,
the training performance of Bagging will often be 1.13 Which of the following statement(s) is/are cor-
better. rect?
A. If we fix the covariance matrix Σ as the identity
1.9 Which of the following would you apply unsuper- matrix I when fitting GMM using EM, then it is
vised learning to? equivalent to the standard K-means algorithm.
A. Given the data of body dimensions collected from B. It is guaranteed that EM algorithm can improve
1,000 consumers, determine the size of the clothes the log likelihood function.
to be produced. C. The latent variable in latent variable models must
B. Develop a model to predict the stock market. be discrete.
C. Given a large dataset of medical records of patient D. Suppose that f is a concave function and X is a
including the disease name, predict the disease of random variable, then f (E(X)) ≥ E(f (X)).
the new patient by the symptom.
D. Compress a high-resolution image to a low- 1.14 Given a linear system Xw = y, where m×d w ∈ Rd
resolution image. is the variable we want to solve, and X ∈ R and
y ∈ Rm are the data provided, which of the following
1.10 A class has 10 students. They received marks for statement(s) is/are correct?
their mid-term quiz as follows. A. When m = d, it is guaranteed to obtain a unique
solution.
Student ID 01 02 03 04 05 B. When m > d, this linear system is called under-
Marks 90 50 66 71 82 determined system, and there is no solution.
Student ID 06 07 08 09 10 C. When m > d, this linear system is called over-
Marks 72 75 68 99 60 determined system, and there is no solution.
D. When m < d, this linear system is called
To group the students into two tutorial groups accord- under-determined system, and there are infinite
ing to their marks, we use k-means. We pick student solutions.
03 as the initial centroid for Group A, and 07 for E. When m < d, this linear system is called
Group B, and assign the students to the two groups over-determined system, and there are infinite
using Euclidean distance. solutions.

A. We will have 4 students in Group A. 1.15 Suppose we want to train a classifier in a super-
vised learning manner, in order to automatically eval-
B. We will have 5 students in Group B. uate student assignments. In this setting, which of the
C. If we change the initial centroid, the clustering following statement(s) is/are correct?
result by kmeans will not be changed.
A. The collected students assignments which have
D. With the above group assignments, we re-estimate not been graded by teachers can be used as the
the new centroids for the two groups. The new experience.
centroid of Group A is 60.
B. The task is to predict the grades of the student
1.11 Which of the following statement(s) is/are cor- assignments.
rect? C. The accuracy of the predicted grades can be used
as the performance measure.
A. When the TPR-curve is plotted as y-axis and the
FPR-curve is plotted as x-axis, the plot is called D. The accuracy of the students’ answers to the
the ROC curve. assignments can be used as the performance
measure.
B. T P R + F P R = 1.
T P R+T N R
C. Accuracy = 2 .
2 Calculations and Derivations (70 points)
D. As the classification threshold increases, the FNR
increases.
1.12 Which of the following statement(s) is/are cor- 2.1 (5 points) Consider two discrete random variables
rect? X and Y , X ∈ {0, 1, 2}, Y ∈ {2, 3}. P (Y = 2) =
0.4, P (Y = 3) = 0.6. Given Y , X follows binomial
n!
A. When we increase the model complexity, bias distribution, i.e., P (X = k|Y = y) = k!(n−k)! ( y1 )k (1 −
drops and variance grows in the training set. 1 n−k
y) , where n = 2.
B. When we increase the model complexity, bias
drops and variance grows in the testing set. 1. Calculate the probability distribution of X, i.e.,
P (X). (1 point)
C. When we increase the model complexity, both
bias and variance drops in the training set. 2. Calculate the conditional distribution of P (Y |X =
1). (Hint: using Bayes rule) (2 points)
3. Calculate E(X), and E(Y |X = 1). (2 points) 3. Derive the solution of w, b according to the La-
grangian function and KKT condition, and tell
2.2 (15 points) We define a CNN model as when a training data is called support vector.
(Hint: firstly write down the Lagrangian function,
fCN N (X) = (1)
then KKT conditions, then use the KKT condi-
Sof tmax(F C1 (Conv2 (M P1 (Relu1 (Conv1 (X)))))). tions to derive the solution.) (6 points)
The size of the input data X is 36 × 36 × 3; the first 2.4 (10 points) Consider the figure below:
convolutional layer Conv1 includes 10 8 × 8 × 3 filters,
stride=2, padding=1; Relu1 indicates the first Relu
layer; M P1 is a 2 × 2 max pooling layer, stride=2;
the second convolutional layer Conv2 includes 100
5 × 5 × 10 filters, stride=1, padding=0; F C1 indi-
cates the fully connected layer, where there are 10 out-
put neurons; Sof tmax denotes the Softmax activation
function. The ground-truth label of X is denoted as t,
and the loss function used for training this CNN model
is denoted as L(y, t).

1. Compute the feature map sizes after Relu1 and

Conv2 (2 points)

2. Calculate the number of parameters of this CNN

model (hint: don’t forget the bias parameter of in
convolution and fully connection) (4 points)

3. Plot the computational graph (CG) of the for- The red circles represent the data points in positive
ward pass of this CNN model (3 points) (hint: use class and the blue squares represents negative. We con-
z1 , z2 , z3 , z4 , z5 , z6 denote the activated value after struct a model to do the prediction and get a decision
Conv1 , Relu1 , M P1 , Conv2 , F C1 , Sof tmax) boundary as the black curve. The points on the top
right of the decision boundary are predicted to be pos-
4. Based on the plotted CG, write down the formula- itive and other points at the bottom left of the bound-
tions of back-propagation algorithm, including the ary are predicted to be negative.
forward and backward pass (6 points) (Hint: for
the forward pass, write down the process of how 1. Write down the confusion matrix of this classifica-
to get the value of loss function L(y, t); for the tion. (2 points)
backward pass, write down the process of comput- 2. Calculate the precision, recall and accuracy of this
ing the partial derivative of each parameter, like classification method. (3 points)
∂L ∂L
∂w1 , ∂b1 )
3. Calculate the FNR and FPR. (2 points)
2.3 (17 points) Derivations of SVM’s objective func-
tion and solution according to dual problem. Given 4. Suppose that the posterior probabilities of the 6
n
the training data {(xi , yi )}i , denoting the parameters positive data p(yi = +|xi ) are 0.2, 0.5, 0.7, 0.7, 0.8
as w, b, the objective function of SVM is formulated as and 0.9, respectively. The posterior probabilities
follows: of the 4 negative data p(yi = +|xi ) are 0.1, 0.3,
1 0.5 and 0.6, respectively. Calculate the AUC of
min kwk2 (2) this model. (3 points)
w,b 2

s.t. yi w> xi + b ≥ 1, ∀i

2.5 (15 points) Suppose that a random variable x
follows the Gaussian mixture distribution with p(x) =
PK
1. Derive the above objective from the perspective k=1 πk N (x | µk , Σk ). Where Θ = {π, µ, Σ}, π =
of large margin. (Hint: The margin is defined as {π1 , . . . , πK } , µ = {µ1 , . . . , µK } , Σ = {Σ1 , . . . , ΣK }
the closest distance from the data point to the and latent variable z ∼ Categorical (π), where π ≥
hyperplane. We wish to find a hyperplane that 0,
P
k πk = 1.
separate the data, meanwhile having the largest
margin among all the hyperplanes. The distance 1. Likelihood Decomposition. Show that
of a point y to a hyperplane given by fw,b (x) :=
|f (x)| ln p(D; Θ) ≥ L(q; Θ), ∀q, Θ,
wT x + b = 0 is given by w,b kwk ) (6 points)
where
2. Derive the above objective from the perspective of
hinge loss. (Hint: The objective function of SVM XN
p(x(n) , z (n) ; )

using hinge loss is given by L(q; Θ) = Eqn (z(n) ) ln ,
n=1 qn (z (n) )
m
1 QN
and q(z) = n=1 qn z (n) . And, write down the
X
C hinge loss(x, y; w, b) + ||w||2
i
2 gap between ln p(D; Θ) and L(q; Θ). (6 points)

You can firstly write down the explicit expression 2. E-Step Derivation. With given Θ = {π, µ, Σ},
of hinge loss. ) (5 points) update q(z). (3 points)
3. M-Step Derivation Given q(z), update Θ =
{π, µ, Σ}. (6 points)

2.6 (8 points) Consider a binary classification with

the following training data. Each data is described by
3 attributes: Color, Size, Shape.

Example Color Shape Size Class

1 Red Square Big +
2 Blue Square Big +
3 Red Square Big +
4 Red Circle Big +
5 Green Circle Big -
6 Blue Circle Small -
7 Blue Circle Small -
8 Green Square Big -

1. Use the entropy method to find the best attribute

for the root node. (Hint: Calculate the entropy
and information gain) (4 points)
2. Build a complete tree according to the entropy
method. The tree should include the class label
and training data in each node. (2 points)
3. Compute the classification error of each node of
the tree. (2 points)

On The Wine-Dark Sea
No ratings yet
On The Wine-Dark Sea
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
E-Commerce & Business Communication Ebook (SEM 4)
No ratings yet
E-Commerce & Business Communication Ebook (SEM 4)
87 pages
B 2 B Sales Manager Checklist
No ratings yet
B 2 B Sales Manager Checklist
1 page
Pronunciation Rules Regular Past Verbs - US
No ratings yet
Pronunciation Rules Regular Past Verbs - US
1 page
Ict Lesson 9 Notes
No ratings yet
Ict Lesson 9 Notes
1 page
Semester V (2022-25)
No ratings yet
Semester V (2022-25)
1 page
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
2009 Economic Factors and Incentives For Ocean Wave Energy Conversion
No ratings yet
2009 Economic Factors and Incentives For Ocean Wave Energy Conversion
8 pages
11月13日文献（陈洁）a Neural Circuit for Bergamot Essential Oil‐Induced Anxiolytic Effects
No ratings yet
11月13日文献（陈洁）a Neural Circuit for Bergamot Essential Oil‐Induced Anxiolytic Effects
13 pages
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
No ratings yet
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
16 pages
School Brochure 2024-2025
No ratings yet
School Brochure 2024-2025
2 pages
Week 7 Milestone Worksheet Completed
No ratings yet
Week 7 Milestone Worksheet Completed
16 pages
Machine Life Cycle Analysis
No ratings yet
Machine Life Cycle Analysis
1 page
Final Marketing Plan Whole
No ratings yet
Final Marketing Plan Whole
19 pages
Quiz 1-A
No ratings yet
Quiz 1-A
5 pages
National Apprenticeship Training Scheme (NATS)
No ratings yet
National Apprenticeship Training Scheme (NATS)
5 pages
FSSAI - Internship Portal
No ratings yet
FSSAI - Internship Portal
3 pages
SPM Physics Definition List
No ratings yet
SPM Physics Definition List
5 pages
Twee - Everyday Life Vocabulary
No ratings yet
Twee - Everyday Life Vocabulary
5 pages
ML Paper
No ratings yet
ML Paper
23 pages
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
No ratings yet
Resident Evil Code - Veronica X - Action Replay Codes, US - Cheat Happens
7 pages
Status and Prospects For Helicopter Apus in Russia Gavrilov V.V., Ponomarev B.A
No ratings yet
Status and Prospects For Helicopter Apus in Russia Gavrilov V.V., Ponomarev B.A
16 pages
Symptoms of Ca3 Problems
100% (1)
Symptoms of Ca3 Problems
4 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Contributions of Muslim Scientists
No ratings yet
Contributions of Muslim Scientists
5 pages
Risk Matrix Rev-06 (Finalized)
No ratings yet
Risk Matrix Rev-06 (Finalized)
1 page
Final 2019
No ratings yet
Final 2019
15 pages
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
No ratings yet
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
8 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
ML MCQs Set
No ratings yet
ML MCQs Set
18 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
ServerAdmin v10.6
No ratings yet
ServerAdmin v10.6
197 pages
5926 - Question - Paper ML
No ratings yet
5926 - Question - Paper ML
2 pages
Week1 Assignment
No ratings yet
Week1 Assignment
6 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
Top 40 Machine Learning Questions & Answers: Which of The Following Statement Is True in The Following Case?
No ratings yet
Top 40 Machine Learning Questions & Answers: Which of The Following Statement Is True in The Following Case?
34 pages
In-Line Mixing
No ratings yet
In-Line Mixing
9 pages
ML 20240315
No ratings yet
ML 20240315
8 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
MCQ Machine Learning
No ratings yet
MCQ Machine Learning
23 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Sem 5 External
No ratings yet
Sem 5 External
12 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
ML Assignment NPTEL
No ratings yet
ML Assignment NPTEL
25 pages
ML Questions
No ratings yet
ML Questions
6 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
Synthesis of Ethanol by Fermentation
No ratings yet
Synthesis of Ethanol by Fermentation
10 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Unit IV Naïve Bayes and Support Vector Machine
No ratings yet
Unit IV Naïve Bayes and Support Vector Machine
22 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
INTEGERS (Lesson Plan)
No ratings yet
INTEGERS (Lesson Plan)
4 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Final Quiz
No ratings yet
Final Quiz
6 pages
d3 PDF
No ratings yet
d3 PDF
7 pages
Tewodros Tesfahun
No ratings yet
Tewodros Tesfahun
155 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
No ratings yet
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
60 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Machine Learning - AKTU PAPER (Session 2019 - 2020)
No ratings yet
Machine Learning - AKTU PAPER (Session 2019 - 2020)
10 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Final: CS 189 Spring 2016 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2016 Introduction To Machine Learning
12 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages

Dda3020 22

Uploaded by

Dda3020 22

Uploaded by

DDA2020 Machine Learning

1. Compute the feature map sizes after Relu1 and

2. Calculate the number of parameters of this CNN

2.6 (8 points) Consider a binary classification with

Example Color Shape Size Class

1. Use the entropy method to find the best attribute

You might also like